radical-cybertools / radical.saga

A Light-Weight Access Layer for Distributed Computing Infrastructure and Reference Implementation of the SAGA Python Language Bindings.
http://radical-cybertools.github.io/saga-python/
Other
82 stars 34 forks source link

`pty_uril` error on Summit #725

Open mturilli opened 5 years ago

mturilli commented 5 years ago

Stack:

$ radical-stack 
/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/utils
/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/saga
/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/pilot

  python               : 2.7.15
  pythonpath           : /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/py-setuptools-40.4.3-rc56sxgpafwvs5eyrvc3uxiaqoc6oe2f/lib/python2.7/site-packages:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/py-pip-10.0.1-2gr5x7tsnuxwissqhzapdbmlpheove3i/lib/python2.7/site-packages:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/py-virtualenv-16.0.0-phcok3x4eyd36qfh5ptv66isyol4ui4b/lib/python2.7/site-packages:/sw/summit/xalt/1.1.3/site:/sw/summit/xalt/1.1.3/libexec
  virtualenv           : /autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte

  radical.pilot        : 0.62.0-v0.62.0-346-g8b122b6@experiment-periscope
  radical.saga         : 0.62.0-v0.62.0-26-gea22176@devel
  radical.utils        : 0.60.2-v0.60.2-5-g87e3d67@devel

Code: https://github.com/radical-experiments/summit_jsrun_prte/blob/master/baseline.py

Trace:

2019-06-30 19:28:15,474: radical.saga        : MainProcess                     : MainThread     : WARNING : skip adaptor radical.saga.adaptors.srm.srmfile: init failed
Traceback (most recent call last):
  File "/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/saga/engine/engine.py", line 185, in _load_adaptors
    adaptor_instance = adaptor_module.Adaptor ()
  File "/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/utils/singleton.py", line 35, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/saga/adaptors/srm/srmfile.py", line 87, in __init__
    self.pty_url = self._cfg['pty_url']
  File "/autofs/nccs-svm1_home1/mturilli1/experiments/ve/jsrun_prte/lib/python2.7/site-packages/radical/utils/config.py", line 305, in __getitem__
    raise KeyError('no such key [%s]' % key)
KeyError: 'no such key [pty_url]'
andre-merzky commented 5 years ago

It is actually not an error, but a warning, which warns about one of the SAGA adaptors (here the SRM adaptor) not being loaded. We just dump the stacktrace from the loading attempt to record info about why it is not loaded. The engine handles failing adaptors gracefully. In this specific case, we did not really use SRM in any of our projects, so it does not cause any follow-up errors.

That much for context - any proposal on how to handle this, or if it needs handling?

mturilli commented 5 years ago

Is there a reason because we try to load it if we are not using it? For reporting, it is well identified as a WARNING so that works for me.

andre-merzky commented 5 years ago

Is there a reason because we try to load it if we are not using it?

No particular reason: SAGA simply tries to load all adaptors it knows about. At the moment those adaptors are statically defined in the adaptor registry, and we can comment out unsupported plugins. So we can easily disable the SRM adaptor whose state is dubious at best. We do not have the ability to control plugin loading on API level (and probably should not have that).

FWIW, RU has a plugin manager which loads plugins by looking at installed (not registered) plugins, and is more flexible - but I never got around switching to it in SAGA. RCT.v2 should use it in several places though (SAGA adaptors, RP schedulers and launch methods, etc).

mturilli commented 5 years ago

Thanks. OK, let's disable SRM then (I doubt anyone will ever use it again anyway) and add a feature 'request' for RCT2, maybe to our design specification document?

andre-merzky commented 5 years ago

Yes, makes sense - will do!

andre-merzky commented 4 years ago

Will be resolved by cleaning the config system.