radical-collaboration / extasy-grlsd

Repository to hold the input data and scripts for the ExTASY gromacs-lsdmap work
1 stars 1 forks source link

entk delays resource request submission #92

Closed euhruska closed 6 years ago

euhruska commented 6 years ago

In some cases entk gets stuck at this line:

2018-09-21 11:58:08,533: radical.entk.wfprocessor.0000: MainProcess                     : MainThread     : INFO    : Created WFProcessor object: wfprocessor.0000

It takes like 5 min, sometimes longer to get to the next line.

2018-09-21 12:02:15,216: radical.entk.appmanager.0000: MainProcess                     : MainThread     : INFO    : Starting resource request submission

What is this caused by?

vivek-bala commented 6 years ago

Do you have any environment variable called "SAGA_PTY_SSH_TIMEOUT" set? If so, can you tell me its value.

euhruska commented 6 years ago

it's 300

euhruska commented 6 years ago

once it just exited the process without any error before Starting resource request submission

vivek-bala commented 6 years ago

The probes/msgs you print above point to the event where the (1) job description is created, (2) a channel is opened to BW, (3) the job description is submitted and you wait for the system to respond. My hunch at this point is that it stalls at 3. Basically, it issues "qsub .sh" and waits for that command to return. I am not sure why the system takes long though (load on the login node?).

@andre-merzky do you think my understanding of the behavior is correct? The two msgs that Eugen printed are basically around the call pmgr.submit_pilots().

andre-merzky commented 6 years ago

I would love to have a look at the RP and SAGA logs. @euhruska , could you please set the following:

export RADICAL_SAGA_PTY_VERBOSE=DEBUG
export RADICAL_VERBOSE=DEBUG
export RADICAL_LOG_TGT=rct.log

You should end up with and rct.log, but also with a subdir named like rp.session.something_else - please pack those files up and attach them, if you don't mind.

Thanks!

euhruska commented 6 years ago

Here log_extasy_tica3_villin_small.log rp.session.leonardo.rice.edu.eh22.017798.0001.zip rct.log