Slurm CPU setting issue

carsonhh commented 1 month ago

You need to change line 108 of src/miniwdl_slurm/init.py as you are using the wrong slurm command line flag

Current value: srun_args.extend(["--cpus-per-task", str(cpu)])

Change to: srun_args.extend(["--mincpus", str(cpu)])

On HPC clusters when "oversubcribe" is disabled (i.e. no node sharing allowed), jobs always get assigned an entire node. If you are assigned a 24 core node, but have '--cpus-per-task 8' set but did not set '--ntasks 1', then slurm's behavior is to launch 3 tasks, and give each 8 CPUs. So the commands in the WDL gets launched 3 times simultaneously instead of the expected 1 time. The outputs of the simultaneous commands overwrite each other and the container can fail with an exit code of either 1 or 255. You should instead be using slurm's '--mincpus' option which has the behavior of launching a single non-duplicate task and giving it at least 8 CPUs (--mincpus 8).

Alternatively, you need to explicitly set '--ntasks 1' and not let Slurm try and calculate it.

carsonhh commented 1 month ago

example commands to better see the issue. Note that either setting -n/ntasks to 1 or using --mincpus instead fixes the duplicate command issue in slurm.

$ srun -c 8 hostname
srun: job 1332825 queued and waiting for resources
srun: job 1332825 has been allocated resources
notch104
notch104
notch104
notch104
notch104
notch104

$ srun -c 8 -n 1 hostname
srun: job 1332826 queued and waiting for resources
srun: job 1332826 has been allocated resources
notch104

$ srun --mincpus 8 hostname
srun: job 1332827 queued and waiting for resources
srun: job 1332827 has been allocated resources
notch104

rhpvorderman commented 1 month ago

Thanks for the suggestion. Our cluster does allow oversubscribing and nodes are shared, so I haven't run into this issue. I will make a fix.

miniwdl-ext / miniwdl-slurm

Slurm CPU setting issue #9