vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
269 stars 45 forks source link

Unable to use remote with dual factor auth #1512

Closed pgcudahy closed 1 year ago

pgcudahy commented 1 year ago

Hello Bo, I'm not sure if there's any way around this, but my university has just updated its cluster and now requires dual factor authentication for any ssh connection. Now my remote config won't work anymore.

!sos remote -v4 -c /home/pgcudahy/.sos/mccleary.yml test mccleary_scavenge

WARNING: Failed to connect to mccleary_scavenge: ssh connection to pgc29@mccleary.ycrc.yale.edu time out with prompt: b'\rDuo two-factor login for pgc29\r\n\r\nEnter a passcode or select one of the following options:\r\n\r\n 1. Duo Push to XXX-XXX-XXXX\r\n 2. Phone call to XXX-XXX-XXXX\r\n\r\nPasscode or option (1-2): '

Can a second password be set (just the character 1) so that a push notification gets sent to me?

BoPeng commented 1 year ago

It is possible to modify sos and allow automatic entering of newlines, but overall we are hitting a dead end. This is because sos uses this ssh channel quite often, for example, it will call something like ssh server sos status to retrieve the status of the tasks every so often (configurable, default to 30s), and it is simply not possible for you to enter that six-digit code every now and then. You can disable remote task status query but that will disrupt the execution of the entire workflow if there are tasks after the completion of the remote tasks.

There are several options:

  1. Login to the server and submit the jobs over there. The problem is that the master sos process will remain active on the head node, and will be killed at least on our clusters.
  2. Use a daemon process on the head node to communicate with outside sos instances, bypassing the ssh channel. I explored this option a while ago but did not finish it. The biggest problem is still keeping a process running on the head node.
  3. Submit the entire workflow to working nodes, namely using one worker node as the master node, and multiple other nodes as slave nodes. sos in theory supports this running mode but we have never seriously tested this mode.
pgcudahy commented 1 year ago

Thanks Bo, that's what I suspected. I have actually gotten option 3 working very well on our old cluster, but still with the jupyter notebook running on my own computer. With the new cluster I've tried to port my hosts.yml config and run jobs from a jupyter notebook running on a cluster instance, but it has been very brittle for unclear reasons. I'll keep looking into it.

BoPeng commented 1 year ago

Thanks. Our Jupyter instance is out of the cluster but I will try to start a jupyter instance from within the cluster and submit jobs over there next time.