Open WPoelman opened 2 months ago
This is an interesting one. Few remarks:
TRES
and GRES
) in Slurm, using the supported directives. On VSC docs, we have opted for the most obvious and generic ones to cover the majority of the use cases on our clusters.--cpus-per-gpu
is a useful option for multi-GPU jobs, when an advanced user wants to take full control over process distribution. For the single-GPU jobs, it does not offer much of added values. Take the following two-node GPU example:
srun -A <account> -M genius --nodes=2 --ntasks=8 --cpus-per-gpu=1 --gpus-per-node=4 --pty bash -l
And you immediately see how transparent it is to specify the --cpus-per-gpu
option.
So, if you have another comment or question, please let us know. Else, we can perhaps close this issue item.
In the documentation all examples use
ntasks
orn
to specify the number of CPUs needed per GPU. This generally works fine, but external tools (such as submitit ) have a specific interpretation ofntasks
, which can lead to issues. It might be better to explicitly use thecpus-per-gpu
slurm option in the examples to avoid such issues. The options both work identically in my tests requesting GPUs on the debug node and on wice.