radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

Mongodb(?) connectivity issues. #237

Closed ibethune closed 8 years ago

ibethune commented 8 years ago

Right now I'm running into a problem where my extasy jobs fails (repeatably) with an error related to mongodb in the agent_0.err file:

2016-01-21 14:18:01,500: radical.saga        : MainProcess                     : MainThread     : INFO    : python.interpreter   version: 2.7.6 (default, Mar 10 2014, 14:13:45) [GCC 4.8.1 20130531 (Cray Inc.)]
2016-01-21 14:18:01,500: radical.saga        : MainProcess                     : MainThread     : INFO    :                      pid: 18293
2016-01-21 14:18:01,500: radical.saga        : MainProcess                     : MainThread     : INFO    :                      tid: MainThread
2016-01-21 14:18:01,501: radical.saga        : MainProcess                     : MainThread     : INFO    : radical.saga         version: 0.40
2016-01-21 14:18:01,510: radical.pilot       : MainProcess                     : MainThread     : INFO    : python.interpreter   version: 2.7.6 (default, Mar 10 2014, 14:13:45) [GCC 4.8.1 20130531 (Cray Inc.)]
2016-01-21 14:18:01,510: radical.pilot       : MainProcess                     : MainThread     : INFO    :                      pid: 18293
2016-01-21 14:18:01,510: radical.pilot       : MainProcess                     : MainThread     : INFO    :                      tid: MainThread
2016-01-21 14:18:01,510: radical.pilot       : MainProcess                     : MainThread     : INFO    : radical.pilot        version: 0.38
Traceback (most recent call last):
  File "/work/e290/e290/e290ib/radical.pilot.sandbox/rp.session.wlandhcp165.dl.ac.uk.ibethune.016821.0012-pilot.0000/rp_install/bin/radical-pilot-agent-multicore.py", line 6226, in <module>
    bootstrap_3()
  File "/work/e290/e290/e290ib/radical.pilot.sandbox/rp.session.wlandhcp165.dl.ac.uk.ibethune.016821.0012-pilot.0000/rp_install/bin/radical-pilot-agent-multicore.py", line 6054, in bootstrap_3
    _, mongo_db, _, _, _  = ru.mongodb_connect(cfg['mongodb_url'])
  File "/work/e290/e290/e290ib/radical.pilot.sandbox/rp.session.wlandhcp165.dl.ac.uk.ibethune.016821.0012-pilot.0000/rp_install/lib/python2.7/site-packages/radical/utils/misc.py", line 95, in mongodb_connect
    db.authenticate (user, pwd)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/database.py", line 978, in authenticate
    self.connection._cache_credentials(self.name, credentials)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/mongo_client.py", line 467, in _cache_credentials
    auth.authenticate(credentials, sock_info, self.__simple_command)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/auth.py", line 475, in authenticate
    auth_func(credentials[1:], sock_info, cmd_func)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/auth.py", line 450, in _authenticate_default
    return _authenticate_scram_sha1(credentials, sock_info, cmd_func)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/auth.py", line 238, in _authenticate_scram_sha1
    sasl_start, sasl_continue)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/auth.py", line 185, in _scram_sha1_conversation
    res, _ = cmd_func(sock_info, source, sasl_start)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/mongo_client.py", line 703, in __simple_command
    helpers._check_command_response(response, None, msg)
  File "/work/e290/shared/shared_pilot_ve_20150924/lib/python2.7/site-packages/pymongo/helpers.py", line 182, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: command SON([('saslStart', 1), ('mechanism', 'SCRAM-SHA-1'), ('autoAuthorize', 1), ('payload', Binary('n,,n=extasy,r=MjgxNzM5ODUyMjQ=', 0))]) on namespace radicalpilot.$cmd failed: Authentication failed.

Then the PBS jobs exits and all is lost... Any ideas?

ibethune commented 8 years ago

Update. After repeating 3 times in a row, this no longer recreates... any ideas as to the cause?

andre-merzky commented 8 years ago

Unfortunately, there is always the possibility of intermittent network or service failures, or package losses etc. A log trace for this session may, or may not, reveal the problem - but if not, it will be hard to debug if it does not reoccur :(

ibethune commented 8 years ago

No recreate, unfortunately(?)