Closed euhruska closed 6 years ago
Do you have any environment variable called "SAGA_PTY_SSH_TIMEOUT" set? If so, can you tell me its value.
it's 300
once it just exited the process without any error before Starting resource request submission
The probes/msgs you print above point to the event where the (1) job description is created, (2) a channel is opened to BW, (3) the job description is submitted and you wait for the system to respond. My hunch at this point is that it stalls at 3. Basically, it issues "qsub
@andre-merzky do you think my understanding of the behavior is correct? The two msgs that Eugen printed are basically around the call pmgr.submit_pilots()
.
I would love to have a look at the RP and SAGA logs. @euhruska , could you please set the following:
export RADICAL_SAGA_PTY_VERBOSE=DEBUG
export RADICAL_VERBOSE=DEBUG
export RADICAL_LOG_TGT=rct.log
You should end up with and rct.log
, but also with a subdir named like rp.session.something_else
- please pack those files up and attach them, if you don't mind.
Thanks!
In some cases entk gets stuck at this line:
It takes like 5 min, sometimes longer to get to the next line.
What is this caused by?