simonsobs-uk / data-centre

This tracks the issues in the baseline design of the SO:UK Data Centre at Blackett
https://souk-data-centre.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

condor_ssh_to_job resulted in interactive job ending immediately #41

Open ickc opened 7 months ago

ickc commented 7 months ago

If condor_ssh_to_job is used to ssh into an interactive job, it will terminates immediately.

MWE:

On vm77, in the 1st process,

❯ cat example.ini
RequestMemory = 32999
RequestCpus = 16
queue
❯ condor_submit -i example.ini
Submitting job(s).
1 job(s) submitted to cluster 1883.
Waiting for job to start...
Welcome to slot1_1@wn3806190.tier2.hep.manchester.ac.uk!

Then in a 2nd process,

❯ condor_ssh_to_job 1883
Welcome to slot1_1@wn3806190.tier2.hep.manchester.ac.uk!
Connection to condor-job.wn3806190.tier2.hep.manchester.ac.uk closed by remote host.
Connection to condor-job.wn3806190.tier2.hep.manchester.ac.uk closed.

Then immediately in the 1st process,

bash-4.2$ Connection to condor-job.wn3806190.tier2.hep.manchester.ac.uk closed by remote host.
Connection to condor-job.wn3806190.tier2.hep.manchester.ac.uk closed.
ickc commented 6 months ago

I just tested again using a testing account soukdevtester (i.e. blank state with no ssh config, etc.) and yields the same thing.

ickc commented 6 months ago

Also, this only fails in interactive mode. If I start a job in vanilla universe (non-interactively), then I can successfully condor_ssh_to_job into the job. Not only that, I can repeatedly do it (i.e. 2 different processes condor_ssh_to_job to the same job at the same time). But then unexpectedly, if I quit in 1 such process, the whole job quit prematurely.

rwf14f commented 5 months ago

I can confirm this. In interactive mode this might be the intended behaviour, but that all ssh sessions for non-interactive jobs get terminated when one is exited is a bug.