vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
269 stars 45 forks source link

Combined use of -r and -q #1515

Open BoPeng opened 1 year ago

BoPeng commented 1 year ago

Suppose hpc is the headnode of a PBS-based cluster,

sos run -r hpc -q hpc

will cause sos to

  1. -r hpc willl execute the entire workflow on hpc. Because hpc has a pbs queue, the entire workflow will be submitted as a single-node PBS job
  2. -q hpc will let the workflow, now executed on a computing node, submit jobs from hpc

For step 2, this requires computing node can access the head node in order to submit jobs (ssh headnode qsub ....), which is at least not the case for the cluster system at our institution.

Now, assuming that hpc_headnode is defined as a regular queue (not pbs),

sos run -r hpc_headnode -q hpc

will

  1. -r hpc_headnode will execute the workflow on the headnode. This means the sos process will remain active during the execution of the workflow, which is not allowed for many cluster systems.
  2. -q hoc will submit the job from the headnode.

The problem with this approach is that, with the current implementation, the job will be submitted with ssh hpc ... from hpc_headnode. Public key authentication does not work for this case.