radical-cybertools / radical.repex.at

This is the github location for RepEx developed by the RADICAL team in conjunction with the York Lab.
Other
4 stars 3 forks source link

TSU problem in experiments branch #77

Closed haoyuanchen closed 8 years ago

haoyuanchen commented 8 years ago

The simulation was fine until the 6th exchange step (U dimension in the 2nd cycle), in which it says

2016-03-30 13:59:22,990: radical.repex : MainProcess : MainThread : ERROR : ERROR: In D3 Global-Exchange-step failed for unit: unit.000773 2016-03-30 13:59:22,990: radical.repex : MainProcess : MainThread : INFO : Unexpected error: <type 'exceptions.IOError'> 2016-03-30 13:59:22,990: radical.repex : MainProcess : MainThread : INFO : Closing session. 2016-03-30 13:59:32,971: radical.repex : MainProcess : Thread-1 : INFO : ComputePilot 'pilot.0000' state changed to Canceled. Traceback (most recent call last): File "/home/haoyuan/myenv/bin/repex-amber", line 5, in pkg_resources.run_script('radical.repex==0.2.9', 'repex-amber') File "/home/haoyuan/myenv/lib/python2.7/site-packages/pkg_resources.py", line 488, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/haoyuan/myenv/lib/python2.7/site-packages/pkg_resources.py", line 1354, in run_script execfile(script_filename, namespace, namespace) File "/home/haoyuan/myenv/lib/python2.7/site-packages/radical.repex-0.2.9-py2.7.egg/EGG-INFO/scripts/repex-amber", line 164, in pilot_kernel.run_simulation( replicas, pilot_object, session, md_kernel ) File "/home/haoyuan/myenv/lib/python2.7/site-packages/radical.repex-0.2.9-py2.7.egg/pilot_kernels/pilot_kernel_pattern_s_multi_d_sc.py", line 587, in run_simulation md_kernel.do_exchange(current_cycle, DIM, replicas) File "/home/haoyuan/myenv/lib/python2.7/site-packages/radical.repex-0.2.9-py2.7.egg/amber_kernels_3d_tsu/kernel_pattern_s_3d_tsu.py", line 774, in do_exchange f = open(infile) IOError: [Errno 2] No such file or directory: 'pairs_for_exchange_3_5.dat'

However, I checked unit.000773 and it finished normally. The pairs_for_exchange_3_5.dat file was generated and not empty.

antonst commented 8 years ago

I assume you are using multiple CPUs per replica? If this is the case, please pull the latest commit from feature/experiments branch.

haoyuanchen commented 8 years ago

No. I was using 1 CPU per replica, but I've pulled the latest commit and tried again. Still waiting in the queue.

antonst commented 8 years ago

with how many replicas you are running? on which machine?

haoyuanchen commented 8 years ago

96 replicas, on Stampede.

haoyuanchen commented 8 years ago

From another run, the problem still exists.

antonst commented 8 years ago

Please try to run with feature/perfopt_gen branch. If problem still persists please share terminal output and simulation input file.