radical-collaboration / extasy-grlsd

Repository to hold the input data and scripts for the ExTASY gromacs-lsdmap work
1 stars 1 forks source link

error when launching several extasy runs #86

Closed euhruska closed 5 years ago

euhruska commented 5 years ago

I got an error when I started several extasy runs at the same time. When I wait a bit between each start, no error. Still, annoying when starting many extasy runs. Is there a way to launch many extasy runs at the same time? @vivek-bala


Traceback (most recent call last):
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/radical/utils/lease_manager.py", line 175, in _create_object
    obj = _LeaseObject (self, self._log, creator, args)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/radical/utils/lease_manager.py", line 33, in __init__
    self.obj        = creator (*args)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py", line 284, in _shell_creator
    return sups.PTYShell(url, self.get_session(), self._logger)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 248, in __init__
    self.pty_shell  = self.factory.run_shell  (self.pty_info)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 450, in run_shell
    self._initialize_pty (sh_slave, info)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 413, in _initialize_pty
    raise ptye.translate_exception (e)
AuthenticationFailed: too frequent login attempts, or sshd misconfiguration: read from process failed '[Errno 5] Input/output error' : (/usr//etc/ssh not found.
mux_client_request_session: session request failed: Session open refused by peer
ssh_exchange_identification: read: Connection reset by peer
) ((/usr//etc/ssh not found.
mux_client_request_session: session request failed: Session open refused by peer
ssh_exchange_identification: read: Connection reset by peer
)) (/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_exceptions.py +58 (translate_exception)  :  e = se.AuthenticationFailed ("too frequent login attempts, or sshd misconfiguration: %s" % cmsg))
2018-09-15 20:02:22,238: radical.entk.resource_manager.0000: MainProcess                     : pmgr.0000.subscriber._state_sub_cb: INFO    : Pilot pilot.0000 state: FAILED
2018-09-15 20:02:22,238: radical.entk.resource_manager.0000: MainProcess                     : pmgr.0000.subscriber._state_sub_cb: ERROR   : Pilot has failed
2018-09-15 20:02:22,366: radical.utils       : pmgr.0000.launching.0           : pmgr.0000.launching.0.child.subscriber._pmgr_control_cb: ERROR   : Could not create lease object
Traceback (most recent call last):
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/radical/utils/lease_manager.py", line 175, in _create_object
    obj = _LeaseObject (self, self._log, creator, args)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/radical/utils/lease_manager.py", line 33, in __init__
    self.obj        = creator (*args)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/adaptors/shell/shell_file.py", line 284, in _shell_creator
    return sups.PTYShell(url, self.get_session(), self._logger)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 248, in __init__
    self.pty_shell  = self.factory.run_shell  (self.pty_info)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 450, in run_shell
    self._initialize_pty (sh_slave, info)
  File "/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 413, in _initialize_pty
    raise ptye.translate_exception (e)
AuthenticationFailed: too frequent login attempts, or sshd misconfiguration: read from process failed '[Errno 5] Input/output error' : (/usr//etc/ssh not found.
mux_client_request_session: session request failed: Session open refused by peer
ssh_exchange_identification: read: Connection reset by peer
) ((/usr//etc/ssh not found.
mux_client_request_session: session request failed: Session open refused by peer
ssh_exchange_identification: read: Connection reset by peer
)) (/scratch1/eh22/conda/envs/extasy11/lib/python2.7/site-packages/saga/utils/pty_exceptions.py +58 (translate_exception)  :  e = se.AuthenticationFailed ("too frequent login attempts, or sshd misconfiguration: %s" % cmsg))
vivek-bala commented 5 years ago

The error seems to be coming from the ssh login attempts:

AuthenticationFailed: too frequent login attempts, or sshd misconfiguration: read from process failed '[Errno 5] Input/output error' : (/usr//etc/ssh not found.
mux_client_request_session: session request failed: Session open refused by peer
ssh_exchange_identification: read: Connection reset by peer

It might be that the remote machine has a limit on the number of ssh attempts that can be made within a window. So, when you run your scripts concurrently (or in short intervals) you cross the limit and face this issue.

I think the solution really is it increase the interval between the attempts. If you are automating the extasy runs, is it possible to put a sleep(60) (or something similar) between the attempts.

euhruska commented 5 years ago

I submit these attempts manually, maybe 15s between them. That would explain the issue.