radical-cybertools / radical.repex.at

This is the github location for RepEx developed by the RADICAL team in conjunction with the York Lab.
Other
4 stars 3 forks source link

some issues for 2d #16

Closed antonst closed 8 years ago

antonst commented 9 years ago

Hi Antons,

           I was successfully ran some 2d examples.  However, it failed 5 minutes ago when I tried to repeat at home.  The error messages are below.  Could you take a look?

           Thanks!

Taisung

2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] [{'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 563934), 'logentry': 'Using pilot agent /usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/agent/radical-pilot-agent-multicore.py'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 565079), 'logentry': 'Using bootstrapper /usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/bootstrapper/default_bootstrapper.sh'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 38, 567541), 'logentry': "Copying bootstrapper 'file://localhost//usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/bootstrapper/default_bootstrapper.sh' to agent sandbox (sftp://stampede.tacc.utexas.edu/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907//default_bootstrapper.sh)."}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 21, 54, 116794), 'logentry': u"Copying agent 'file://localhost//usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/agent/radical-pilot-agent-multicore.py' to agent sandbox (sftp://stampede.tacc.utexas.edu/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907/)."}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 22, 0, 462352), 'logentry': 'Submitting SAGA job with description: {\'Queue\': \'normal\', \'Executable\': \'/bin/bash\', \'WorkingDirectory\': \'/work/00661/tg458185/radical.pilot.sandbox/pilot-549104ccbc3ea91f565b4907\', \'Project\': \'TG-MCB090174\', \'WallTimeLimit\': 60, \'Arguments\': [\'-l\', \'default_bootstrapper.sh\', "-n repex-tests -s 549104ccbc3ea91f565b4905 -p 549104ccbc3ea91f565b4907 -t 60 -c 64 -v 0.23 -m ec2-184-72-89-141.compute-1.amazonaws.com:27017 -a : -i /opt/apps/python/epd/7.3.2/bin/python -e \'module purge\' -e \'module load TACC\' -e \'module load cluster\' -e \'module load Linux\' -e \'module load mvapich2\' -e \'module load python/2.7.3-epd-7.3.2\' -e \'module unload xalt\' -e \'export TACC_DELETE_FILES=TRUE\' -l SLURM -j SSH -k IBRUN -d 20 -b"], \'Error\': \'AGENT.STDERR\', \'Output\': \'AGENT.STDOUT\', \'TotalCPUCount\': 64}'}, {'timestamp': datetime.datetime(2014, 12, 17, 4, 22, 4, 379600), 'logentry': 'Pilot launching failed!'}]

Traceback (most recent call last):

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_launcher_worker.py", line 589, in run

pilotjob.run()

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/job/job.py", line 397, in run

return self._adaptor.run (ttype=ttype)

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 51, in wrap_function

return sync_function (self, *args, **kwargs)

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py", line 1190, in run

self._id = self.js._job_run (self.jd)

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py", line 603, in _job_run

" sbatch output:\n%s" % out)

NoSuccess: Couldn't get job id from submitted job! sbatch output:

cat: tmp_JWj2Pg.slurm: No such file or directory


          Welcome to the Stampede Supercomputer             

sbatch: error: Batch script is empty!

(/usr/people/taisung/myenv/lib/python2.7/site-packages/saga/adaptors/slurm/slurm_job.py +603 (_job_run) : " sbatch output:\n%s" % out))

2014:12:16 23:22:04 radical.pilot.MainProcess: [INFO ] ComputePilot '549104ccbc3ea91f565b4907' state changed from 'Launching' to 'Failed'.

2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pilot manager controller thread caught system exit -- forcing application shutdown

Traceback (most recent call last):

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 338, in run

self.call_callbacks(pilot_id, new_state)

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 239, in call_callbacks

cb(self._shared_data[pilot_id]['facade_object'](), new_state)

File "/usr/people/taisung/myenv/lib/python2.7/site-packages/RepEx-0.1-py2.7.egg/pilot_kernels/pilot_kernel.py", line 131, in pilot_state_cb

sys.exit(1)

SystemExit: 1

2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pworker Thread-1 stops launcher PilotLauncherWorker-1

2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] launcher PilotLauncherWorker-1 stopping

2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] launcher PilotLauncherWorker-1 stopped

2014:12:16 23:22:04 radical.pilot.MainProcess: [ERROR ] pworker Thread-1 stopped launcher PilotLauncherWorker-1

antonst commented 8 years ago

If I remember correctly this was installation issue. @taisung can you please confirm that this issue is no longer relevant?

antonst commented 8 years ago

closing due to lack of response