radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

Gromacs/LSDMap ExTASY 0.2 fails on ARCHER #208

Closed ibethune closed 8 years ago

ibethune commented 8 years ago

Hi, I COCO/Amber is working OK on ARCHER using the latest ensemblemd release (0.3.6), but Gromacs/LSDMap fails on ARCHER. I tested this from both my e290 account running from my laptop and my y14 guest account running from workflow.iu.xsede.org. The error is the same in both cases. The relevant output from the job is:

(extasy-test)[ibethune@workflow grlsd-on-archer]$ EXTASY_DEBUG=True RADICAL_ENMD_VERBOSE=REPORT python extasy_gromacs_lsdmap.py --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg

================================================================================
 EnsembleMD (0.3.6)                                                             
================================================================================

Starting Allocation                                                           ok
        Verifying pattern                                                     ok
        Starting pattern execution                                            ok
--------------------------------------------------------------------------------
Executing simulation-analysis loop with 1 iterations on 24 allocated core(s) on 'epsrc.archer'

Job waiting on queue...
Job is now running !
Waiting for pre_loop step to complete.                                      done
Iteration 1: Waiting for simulation tasks: md.gromacs to complete           done
Iteration 1: Waiting for analysis tasks: md.pre_lsdmap to complete          done
Iteration 1: Waiting for analysis tasks: md.lsdmap to complete              done
Iteration 1: Waiting for analysis tasks: md.post_lsdmap to complete2015-10-21 17:48:23,976: radical.pilot       : MainProcess                     : OutputFileTransferWorker-2: ERROR   : Output transfer failed: File does not exist: '/work/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000027/out.gro' -  (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py +1056 (initialize)  :  raise saga.DoesNotExist("File does not exist: '%s' - %s" % (self.url.path, out)))
Traceback (most recent call last):
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/output_file_transfer_worker.py", line 147, in run
    session=self._session)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/filesystem/file.py", line 86, in __init__
    _adaptor, _adaptor_state, _ttype=_ttype)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/namespace/entry.py", line 89, in __init__
    url, flags, session, ttype=_ttype)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/base.py", line 101, in __init__
    self._init_task = self._adaptor.init_instance (adaptor_state, *args, **kwargs)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 57, in wrap_function
    return sync_function (self, *args, **kwargs)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py", line 1007, in init_instance
    self.initialize ()
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py", line 1056, in initialize
    raise saga.DoesNotExist("File does not exist: '%s' - %s" % (self.url.path, out))
DoesNotExist: File does not exist: '/work/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000027/out.gro' -  (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py +1056 (initialize)  :  raise saga.DoesNotExist("File does not exist: '%s' - %s" % (self.url.path, out)))
Exception in thread OutputFileTransferWorker-2:
Traceback (most recent call last):
  File "/opt/radical/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/output_file_transfer_worker.py", line 147, in run
    session=self._session)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/filesystem/file.py", line 86, in __init__
    _adaptor, _adaptor_state, _ttype=_ttype)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/namespace/entry.py", line 89, in __init__
    url, flags, session, ttype=_ttype)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/base.py", line 101, in __init__
    self._init_task = self._adaptor.init_instance (adaptor_state, *args, **kwargs)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 57, in wrap_function
    return sync_function (self, *args, **kwargs)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py", line 1007, in init_instance
    self.initialize ()
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py", line 1056, in initialize
    raise saga.DoesNotExist("File does not exist: '%s' - %s" % (self.url.path, out))
DoesNotExist: File does not exist: '/work/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000027/out.gro' -  (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py +1056 (initialize)  :  raise saga.DoesNotExist("File does not exist: '%s' - %s" % (self.url.path, out)))

2015-10-21 17:48:25,361: radical.enmd.simulation_analysis_loop.static.default: MainProcess                     : Thread-3       : ERROR   : ComputeUnit error: STDERR: Traceback (most recent call last):
  File "selection.py", line 5, in <module>
    from lsdmap.rw import reader
ImportError: No module named lsdmap.rw
Traceback (most recent call last):
  File "reweighting.py", line 8, in <module>
    from lsdmap.rw import reader
ImportError: No module named lsdmap.rw
Traceback (most recent call last):
  File "spliter.py", line 27, in <module>
    grofile_obj = gro.GroFile(os.path.dirname(os.path.abspath(__file__)) + '/' + grofile_name)
  File "/fs4/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000000/gro.py", line 11, in __init__
    self.natoms=self.get_natoms()
  File "/fs4/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000000/gro.py", line 17, in get_natoms
    natoms=int(linecache.getline(self.filename, 2))
ValueError: invalid literal for int() with base 10: ''
, STDOUT: Traceback (most recent call last):
  File "selection.py", line 5, in <module>
    from lsdmap.rw import reader
ImportError: No module named lsdmap.rw
Traceback (most recent call last):
  File "reweighting.py", line 8, in <module>
    from lsdmap.rw import reader
ImportError: No module named lsdmap.rw
Traceback (most recent call last):
  File "spliter.py", line 27, in <module>
    grofile_obj = gro.GroFile(os.path.dirname(os.path.abspath(__file__)) + '/' + grofile_name)
  File "/fs4/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000000/gro.py", line 11, in __init__
    self.natoms=self.get_natoms()
  File "/fs4/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000000/gro.py", line 17, in get_natoms
    natoms=int(linecache.getline(self.filename, 2))
ValueError: invalid literal for int() with base 10: ''

2015-10-21 17:48:25,361: radical.enmd.simulation_analysis_loop.static.default: MainProcess                     : Thread-3       : ERROR   : Pattern execution FAILED.
2015-10-21 17:48:25,362: radical.pilot       : MainProcess                     : Thread-3       : ERROR   : unit manager controller thread caught system exit -- forcing application shutdown
Traceback (most recent call last):
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/unit_manager_controller.py", line 261, in run
    self.call_unit_state_callbacks(unit_id, new_state)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/unit_manager_controller.py", line 198, in call_unit_state_callbacks
    cb(self._shared_data[unit_id]['facade_object'], new_state)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 141, in unit_state_cb
    sys.exit(1)
SystemExit: 1
Execution interuptedTraceback (most recent call last):
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 752, in execute_pattern
    failed_units += " * Analysis task {0} failed with an error: {1}\n".format(unit.uid, unit.stderr)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/compute_unit.py", line 261, in stderr
    return self._worker.get_compute_unit_stderr(self.uid)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/unit_manager_controller.py", line 390, in get_compute_unit_stderr
    return self._dbs.get_compute_unit_stderr(compute_unit_uid)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/db/database.py", line 219, in get_compute_unit_stderr
    return cursor[0]['stderr']
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/cursor.py", line 595, in __getitem__
    for doc in clone:
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/cursor.py", line 1076, in next
    if len(self.__data) or self._refresh():
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/cursor.py", line 1020, in _refresh
    self.__uuid_subtype))
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/cursor.py", line 933, in __send_message
    res = client._send_message_with_response(message, **kwargs)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1205, in _send_message_with_response
    response = self.__send_and_receive(message, sock_info)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1182, in __send_and_receive
    return self.__receive_message_on_socket(1, request_id, sock_info)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1165, in __receive_message_on_socket
    header = self.__receive_data_on_socket(16, sock_info)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1153, in __receive_data_on_socket
    chunk = sock_info.sock.recv(length)
KeyboardInterrupt

        Starting Deallocation                                               done 

The CU with the problem is the lsdmap kernel, and the STDERR there contains information which is relevant:

Traceback (most recent call last):
  File "selection.py", line 5, in <module>
    from lsdmap.rw import reader
ImportError: No module named lsdmap.rw
Traceback (most recent call last):
  File "reweighting.py", line 8, in <module>
    from lsdmap.rw import reader
ImportError: No module named lsdmap.rw
Traceback (most recent call last):
  File "spliter.py", line 27, in <module>
    grofile_obj = gro.GroFile(os.path.dirname(os.path.abspath(__file__)) + '/' + grofile_name)
  File "/fs4/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000000/gro.py", line 11, in __init__
    self.natoms=self.get_natoms()
  File "/fs4/y14/y14/guest02/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016729.0021-pilot.0000/unit.000000/gro.py", line 17, in get_natoms
    natoms=int(linecache.getline(self.filename, 2))
ValueError: invalid literal for int() with base 10: ''

There seems to be something wrong with the lsdmap installation on ARCHER (maybe?). If someone can advise we can try to get it fixed tomorrow morning!

vivek-bala commented 8 years ago

setting the following fixes this

export PYTHONPATH=/work/y07/y07/cse/lsdmap/lsdmap-git-rc2:$PYTHONPATH
export PYTHONPATH=/work/y07/y07/cse/lsdmap/lsdmap-git-rc2/lsdmap/rw:$PYTHONPATH
export PYTHONPATH=/work/y07/y07/cse/lsdmap/lsdmap-git-rc2/util:$PYTHONPATH

I am not sure if this was part of module load lsdmap before (?). I have added this in ensemblemd now and it works.

ibethune commented 8 years ago

Those dirs are not added to the PYTHONPATH in the module:

ibethune@eslogin004:~> module show lsdmap
-------------------------------------------------------------------
/opt/modules/packages-archer/lsdmap/0.1-rc2:

prereq   python-compute 
prereq   pc-numpy 
prereq   pc-scipy 
prepend-path     PYTHONPATH /work/y07/y07/cse/lsdmap/lsdmap-git-rc2/lib/python2.7/site-packages 
prepend-path     PATH /work/y07/y07/cse/lsdmap/lsdmap-git-rc2/bin 
setenv       LSDMAP /work/y07/y07/cse/lsdmap/lsdmap-git-rc2/bin 
-------------------------------------------------------------------

Don't know if it was before.

ibethune commented 8 years ago

Great, it's working now using enmd master.