Open kristofarkas opened 6 years ago
Can you please attach the session sandbox (./rp.session.julian.chem.ucl.ac.uk.kristof.farkas-pall.017506.0011/
) from the client side, ie. from the directory you run your script in? Thanks!
I think the session gets hung up trying to connect to BlueWaters, and that call seems to never return -- see end of rp.session.julian.chem.ucl.ac.uk.kristof.farkas-pall.017506.0011.log
. Does gsissh get onto the machine from command line? Maybe your gsi proxy is expired? We use pretty long timeouts nowadays for the connection setup - but if that never returns I would consider that an error, and would deserve a ticket in radical.pilot
. If so, please rerun with the following settings:
export RADICAL_SAGA_PTY_VERBOSE=DEBUG
export RADICAL_SAGA_PTY_LOG_TGT=pty.log
This will create an additional log on the GSI connection layer - please attach that if this continues to be a problem.
Thanks!
A restart of the computer seems to have fixed the problem. The certificates are all set up correctly btw and gsissh
worked from the terminal.
Hmm, we really don't want to end up in a mode were we'd recommend a reboot to fix RP problems :-P If you happen to stumble over this again, please send an PTY log (as described above), and I'll track it down. Either way though, I am glad you are back in business :)
The pty.log
from a stalled session:
2017-12-06 10:54:59,103: radical.saga.pty : MainProcess : MainThread : INFO : python.interpreter version: 2.7.14 |Anaconda, Inc.| (default, Oct 5 2017, 0
2017-12-06 10:54:59,104: radical.saga.pty : MainProcess : MainThread : INFO : pid: 9484
2017-12-06 10:54:59,104: radical.saga.pty : MainProcess : MainThread : INFO : tid: MainThread
2017-12-06 10:54:59,104: radical.saga.pty : MainProcess : MainThread : DEBUG : PTYShell init <saga.utils.pty_shell.PTYShell object at 0x114c286d0>
2017-12-06 10:54:59,108: radical.saga.pty : MainProcess : MainThread : INFO : PTY prompt pattern: [\$#%>\]]\s*$
2017-12-06 10:54:59,109: radical.saga.pty : MainProcess : MainThread : INFO : ssh copy mode set to 'sftp'
2017-12-06 10:54:59,109: radical.saga.pty : MainProcess : MainThread : INFO : ssh share mode set to 'auto'
2017-12-06 10:54:59,109: radical.saga.pty : MainProcess : MainThread : INFO : ssh timeout set to '2000.0'
2017-12-06 10:54:59,111: radical.saga.pty : MainProcess : MainThread : DEBUG : SSH Connection M_FLAGS: -o ControlMaster=auto -o ControlPath=/tmp/saga_ssh_kr
2017-12-06 10:54:59,111: radical.saga.pty : MainProcess : MainThread : DEBUG : SSH Connection S_FLAGS: -o ControlMaster=auto -o ControlPath=/tmp/saga_ssh_kr
2017-12-06 10:54:59,111: radical.saga.pty : MainProcess : MainThread : DEBUG : open master pty for [ssh] [bw.ncsa.illinois.edu] kristof.farkas-pall: /usr/bi
2017-12-06 10:54:59,112: radical.saga.pty : MainProcess : MainThread : DEBUG : PTYProcess init <saga.utils.pty_process.PTYProcess object at 0x114c58d50>
2017-12-06 10:54:59,113: radical.saga.pty : MainProcess : MainThread : INFO : running: /usr/bin/env TERM=vt100 X509_USER_PROXY=/tmp/x509up_u501 /Library/Gl
2017-12-06 10:54:59,118: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 51] ( export PROMPT_COMMAND='' PS1='$' ; set prompt='$'\n)
2017-12-06 10:54:59,249: radical.saga.pty : MainProcess : MainThread : DEBUG : read : [ 200] [ 36] (/Library/Globus/etc/ssh not found.\n)
2017-12-06 10:55:00,128: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:00,128: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 1\n)
2017-12-06 10:55:01,139: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:01,139: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 2\n)
2017-12-06 10:55:02,146: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:02,146: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 3\n)
2017-12-06 10:55:03,156: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:03,156: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 4\n)
2017-12-06 10:55:04,158: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:04,158: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 5\n)
2017-12-06 10:55:05,159: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:05,160: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 6\n)
2017-12-06 10:55:06,168: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:06,169: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 7\n)
2017-12-06 10:55:07,170: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:07,170: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 8\n)
2017-12-06 10:55:08,173: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:08,173: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 28] ( printf 'HELLO_%d_SAGA\n' 9\n)
2017-12-06 10:55:09,185: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
2017-12-06 10:55:09,185: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 29] ( printf 'HELLO_%d_SAGA\n' 10\n)
2017-12-06 10:55:10,197: radical.saga.pty : MainProcess : MainThread : DEBUG : write: [ 200] [ 69] ( export PROMPT_COMMAND='' PS1='$' > /dev/null 2>&1 ||
These scripts are also running in the background; can that have any influence on this?
Like this: python script.py > output 2>&1 & disown
?
the log is not complete, unfortunately (the lines are really long I'm afraid, and are truncated above).
Running in the background should not make a difference - but the output
file may contain some additional messages.
Thanks!
Sorry about that. Here is the full log: pty.log
This problem now happens after the first successful run (after a restart of the computer).
Update: deleting the certificate from /tmp/
and getting a new one with myproxy-logon
also solves the problem.
When submitting a job, the script just stalls after the
submit 1 pilot(s)
message. I haven't experienced this before, but now it happens every time. The same script worked before.The logs: The
Running...
is printed out right before callingAppManager.run()
.Could this be related to the RabbitMQ or MongoDB server? Or what else?