Closed antonst closed 8 years ago
Hi Antons,
I was successfully ran some 2d examples. However, it failed 5 minutes ago when I tried to repeat at home. The error messages are below. Could you take a look? Thanks!
Taisung
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] [{'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 563934), 'logentry': 'Using pilot agent /usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/agent/radical-pilot-agent-multicore.py'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 565079), 'logentry': 'Using bootstrapper /usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/bootstrapper/default_bootstrapper.sh'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 567541), 'logentry': "Copying bootstrapper 'file://localhost//usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/bootstrapper/default_bootstrapper.sh' to agent sandbox (sftp://stampede.tacc.utexas.edu/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907//default_bootstrapper.sh)."}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 54, 116794), 'logentry': u"Copying agent 'file://localhost//usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/agent/radical-pilot-agent-multicore.py' to agent sandbox (sftp://stampede.tacc.utexas.edu/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907/)."}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 22, 0, 462352), 'logentry': 'Submitting SAGA job with description: {\'Queue\': \'normal\', \'Executable\': \'/bin/bash\', \'WorkingDirectory\': \'/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907\', \'Project\': \'TG-MCB090174\', \'WallTimeLimit\': 60, \'Arguments\': [\'-l\', \'default_bootstrapper.sh\', "-n repex-tests -s 549104ccbc3ea91f565b4905 -p 549104ccbc3ea91f565b4907 -t 60 -c 64 -v 0.23 -m ec2-184-72-89-141.compute-1.amazonaws.com:27017 -a : -i /opt/apps/python/epd/7.3.2/bin/python -e \'module purge\' -e \'module load TACC\' -e \'module load cluster\' -e \'module load Linux\' -e \'module load mvapich2\' -e \'module load python/2.7.3-epd-7.3.2\' -e \'module unload xalt\' -e \'export TACC_DELETE_FILES=TRUE\' -l SLURM -j SSH -k IBRUN -d 20 -b"], \'Error\': \'AGENT.STDERR\', \'Output\': \'AGENT.STDOUT\', \'TotalCPUCount\': 64}'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 22, 4, 379600), 'logentry': 'Pilot launching failed!'}]
Traceback (most recent call last):
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_launcher_worker.py", line 589, in run
pilotjob.run()
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/job/job.py", line 397, in run
return self._adaptor.run (ttype=ttype)
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 51, in wrap_function
return sync_function (self, *args, **kwargs)
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py", line 1190, in run
self._id = self.js._job_run (self.jd)
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py", line 603, in _job_run
" sbatch output:\n%s" % out)
NoSuccess: Couldn't get job id from submitted job! sbatch output:
cat: tmp_JWj2Pg.slurm: No such file or directory
Welcome to the Stampede Supercomputer
sbatch: error: Batch script is empty!
(/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py +603 (_job_run) : " sbatch output:\n%s" % out))
2014:12:16 23:22:04 radical.pilot.MainProcess: [INFO ] ComputePilot '549104ccbc3ea91f565b4907' state changed from 'Launching' to 'Failed'.
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pilot manager controller thread caught system exit -- forcing application shutdown
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 338, in run
self.call_callbacks(pilot_id, new_state)
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 239, in call_callbacks
cb(self._shared_data[pilot_id]['facade_object'](), new_state)
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/RepEx-0.1-py2.7.egg/pilot_kernels/pilot_kernel.py", line 131, in pilot_state_cb
sys.exit(1)
SystemExit: 1
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pworker Thread-1 stops launcher PilotLauncherWorker-1
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] launcher PilotLauncherWorker-1 stopping
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] launcher PilotLauncherWorker-1 stopped
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pworker Thread-1 stopped launcher PilotLauncherWorker-1
If I remember correctly this was installation issue. @taisung can you please confirm that this issue is no longer relevant?
closing due to lack of response
Hi Antons,
Taisung
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] [{'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 563934), 'logentry': 'Using pilot agent /usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/agent/radical-pilot-agent-multicore.py'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 565079), 'logentry': 'Using bootstrapper /usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/bootstrapper/default_bootstrapper.sh'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 567541), 'logentry': "Copying bootstrapper 'file://localhost//usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/bootstrapper/default_bootstrapper.sh' to agent sandbox (sftp://stampede.tacc.utexas.edu/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907//default_bootstrapper.sh)."}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 54, 116794), 'logentry': u"Copying agent 'file://localhost//usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/agent/radical-pilot-agent-multicore.py' to agent sandbox (sftp://stampede.tacc.utexas.edu/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907/)."}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 22, 0, 462352), 'logentry': 'Submitting SAGA job with description: {\'Queue\': \'normal\', \'Executable\': \'/bin/bash\', \'WorkingDirectory\': \'/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907\', \'Project\': \'TG-MCB090174\', \'WallTimeLimit\': 60, \'Arguments\': [\'-l\', \'default_bootstrapper.sh\', "-n repex-tests -s 549104ccbc3ea91f565b4905 -p 549104ccbc3ea91f565b4907 -t 60 -c 64 -v 0.23 -m ec2-184-72-89-141.compute-1.amazonaws.com:27017 -a : -i /opt/apps/python/epd/7.3.2/bin/python -e \'module purge\' -e \'module load TACC\' -e \'module load cluster\' -e \'module load Linux\' -e \'module load mvapich2\' -e \'module load python/2.7.3-epd-7.3.2\' -e \'module unload xalt\' -e \'export TACC_DELETE_FILES=TRUE\' -l SLURM -j SSH -k IBRUN -d 20 -b"], \'Error\': \'AGENT.STDERR\', \'Output\': \'AGENT.STDOUT\', \'TotalCPUCount\': 64}'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 22, 4, 379600), 'logentry': 'Pilot launching failed!'}]
Traceback (most recent call last):
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_launcher_worker.py", line 589, in run
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/job/job.py", line 397, in run
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 51, in wrap_function
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py", line 1190, in run
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py", line 603, in _job_run
NoSuccess: Couldn't get job id from submitted job! sbatch output:
cat: tmp_JWj2Pg.slurm: No such file or directory
sbatch: error: Batch script is empty!
(/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py +603 (_job_run) : " sbatch output:\n%s" % out))
2014:12:16 23:22:04 radical.pilot.MainProcess: [INFO ] ComputePilot '549104ccbc3ea91f565b4907' state changed from 'Launching' to 'Failed'.
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pilot manager controller thread caught system exit -- forcing application shutdown
Traceback (most recent call last):
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 338, in run
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 239, in call_callbacks
File "/usr/people/taisung/myenv/lib/python2.7/site-packages/RepEx-0.1-py2.7.egg/pilot_kernels/pilot_kernel.py", line 131, in pilot_state_cb
SystemExit: 1
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pworker Thread-1 stops launcher PilotLauncherWorker-1
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] launcher PilotLauncherWorker-1 stopping
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] launcher PilotLauncherWorker-1 stopped
2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pworker Thread-1 stopped launcher PilotLauncherWorker-1