radical-collaboration / extasy-grlsd

Repository to hold the input data and scripts for the ExTASY gromacs-lsdmap work
1 stars 1 forks source link

update.0.child.log error #103

Closed euhruska closed 6 years ago

euhruska commented 6 years ago

what does this in update.0.child.log mean?

Traceback (most recent call last):
  File "/scratch/sciteam/hruska/radical.pilot.sandbox/re.session.leonardo.rice.edu.eh22.017829.0003/pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/worker/update.py", line 111, in _timed_bulk_execute
    res = self._bulk.execute()
  File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.50.12/lib/python2.7/site-packages/pymongo/bulk.py", line 587, in execute
    return self.__bulk.execute(write_concern)
  File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.50.12/lib/python2.7/site-packages/pymongo/bulk.py", line 433, in execute
    return self.execute_command(generator, write_concern)
  File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.50.12/lib/python2.7/site-packages/pymongo/bulk.py", line 300, in execute_command
    raise BulkWriteError(full_result)
BulkWriteError: batch op errors occurred

and in agent_0.executing.0.child.log

2018-10-26 05:17:49,232: agent_0.executing.0.child: agent_0.executing.0             : Watcher        : INFO    : Unit unit.000093 has return code 0.
2018-10-26 05:17:49,232: agent_0.executing.0.child: agent_0.executing.0             : Watcher        : DEBUG   : advance bulk size: 1 [True, True]
2018-10-26 05:17:49,233: agent_0.executing.0.child: agent_0.executing.0             : Watcher        : ERROR   : Error in ExecWorker watch loop (can't route 'state_pubsub' notification: ['agent_unschedule_pubsub'])
Traceback (most recent call last):
  File "/scratch/sciteam/hruska/radical.pilot.sandbox/re.session.leonardo.rice.edu.eh22.017829.0003/pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/agent/executing/popen.py", line 399, in _watch
    action = self._check_running()
  File "/scratch/sciteam/hruska/radical.pilot.sandbox/re.session.leonardo.rice.edu.eh22.017829.0003/pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/agent/executing/popen.py", line 486, in _check_running
    self.advance(cu, rps.AGENT_STAGING_OUTPUT_PENDING, publish=True, push=True)
  File "/scratch/sciteam/hruska/radical.pilot.sandbox/re.session.leonardo.rice.edu.eh22.017829.0003/pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/utils/component.py", line 1365, in advance
    self.publish(rpc.STATE_PUBSUB, {'cmd': 'update', 'arg': to_publish})
  File "/scratch/sciteam/hruska/radical.pilot.sandbox/re.session.leonardo.rice.edu.eh22.017829.0003/pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/utils/component.py", line 1433, in publish
    self._publishers.keys()))
RuntimeError: can't route 'state_pubsub' notification: ['agent_unschedule_pubsub']
andre-merzky commented 6 years ago

full_result -> MongoDB is full again...

euhruska commented 6 years ago

damn, I deleted everything a few days ago... does the two.radical-project.org mongoDB have any limits?

andre-merzky commented 6 years ago

no - but we'll eventually run out of disk space. We consider the radical DB to be experimental and not necessarily fit for production. Having said that, it might be more stable than mlab. Your institution may consider to find a (payed) MongoDB provider, or to host MongoDB in-house?

euhruska commented 6 years ago

can you point me to instructions, so I can setup like two.radical-project.org?

andre-merzky commented 6 years ago

There is really not much to it. On any Debian related Linux, it boils down to apt-get install mongodb-server, and I assume it is similar for RPM based distributions. Once installed, you need to make sure that the port is open to the outside: you may need to set the bind_ip value in /etc/mongodb.conf to the external network interface. And you may want to add user IDs and passwords - that is documented here and here (for the second link, search for Mongo 3.2.1 for updated information). You may need to enable auth in the config file.

Let me know if you need more info!

euhruska commented 6 years ago

works