Inheriting cpus-per-task in srun (Slurm 22.05+)

paudano commented 7 months ago

On our cluster, Slurm man pages say:

NOTE: Beginning with 22.05, srun will not inherit the --cpus-per-task value requested by salloc or sbatch. It must be requested again with the call to srun or set with the SRUN_CPUS_PER_TASK environment variable if desired for the task(s).

When I submit multi-core jobs through Snakemake, it looks like they get all the requested cores (squeue), but they behave as if they have a single core. In the job, threads is set to 1. Even if I override this and tell a command to use a hard-coded number of threads (not relying on the rule's threads value), the job still only uses one core.

I think this might be the cause of Snakemake issue #2447.

I created a test case where I'm piping /dev/random through pigz and capturing the top output (Snakemake file and profiles directory): Snakefile.zip profiles.zip

When this command runs, I get (Long string redacted with XXX): sbatch call: sbatch --job-name XXX --output XXX/%j.log --export=ALL --comment rule_a -A XXX -p XXX -t 12 --mem 512 --cpus-per-task=4 -D XXX --wrap="XXX/python3.11 -m snakemake --snakefile 'XXX/Snakefile' --target-jobs 'rule_a:' --allowed-rules 'rule_a' --cores 'all' --attempt 1 --force-use-threads --resources 'mem_mb=512' 'mem_mib=489' 'disk_mb=1000' 'disk_mib=954' --wait-for-files 'XXX/.snakemake/tmp.uq8q7p21' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code mtime input params software-env --conda-frontend 'mamba' --shared-fs-usage input-output sources software-deployment source-cache persistence storage-local-copies --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --set-threads 'rule_a=4' --latency-wait 5 --scheduler 'greedy' --scheduler-solver-path 'XXX/bin' --default-resources 'mem_mb=512' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'runtime=12' --executor slurm-jobstep --jobs 1 --mode 'remote'"

From discussions I've had with our cluster admins, it sounds like Snakemake should be running "srun" inside the --wrap argument (something like --wrap="srun SRUN_PARAMS ... XXX/python3.11 -m snakemake ...").

Thank you!

cmeesters commented 7 months ago

Thank you for bringing this issue to our attention! This is a serious change in SchedMD's interface. I did not notice this because we are just in the process of setting up our new cluster and are so far a version or two behind the release schedule on our current one.

Even if I override this and tell a command to use a hard-coded number of threads (not relying on the rule's threads value), the job still only uses one core.

This is logical because SLURM sets the c-group to be one core and all threads are confined to this c-group.

I will attempt an urgent fix!

paudano commented 7 months ago

Thanks Christian! If it helps, I'll try to setup a test environment on my end to verify a fix before it's released. Yes, the cgroup constraint makes sense.

cmeesters commented 7 months ago

Actually: two fixes are needed - one in the jobstep executor (busy with this one, but the test cases bug me) and this executor.

cmeesters commented 5 months ago

@paudano Please install the newest releases of the snakemake-executor-plugin-slurm and snakemake-executor-plugin-slurm-jobstep. They will be available in Bioconda, shortly.

paudano commented 5 months ago

Thank you! I was able to verify on our Slurm system (23.02.7).

I confirmed jobs are getting getting assigned multiple CPUs when requested, that processes inside the job were able to use multiple CPUs (pigz uses ~ 400% CPU in top with 4 threads).

snakemake                 8.10.8               hdfd78af_0    bioconda
snakemake-executor-plugin-slurm 0.4.5              pyhdfd78af_0    bioconda
snakemake-executor-plugin-slurm-jobstep 0.2.1              pyhdfd78af_0    bioconda
snakemake-interface-common 1.17.2             pyhdfd78af_0    bioconda
snakemake-interface-executor-plugins 9.1.1              pyhdfd78af_0    bioconda
snakemake-interface-report-plugins 1.0.0              pyhdfd78af_0    bioconda
snakemake-interface-storage-plugins 3.2.2              pyhdfd78af_0    bioconda
snakemake-minimal         8.10.8             pyhdfd78af_0    bioconda
python                    3.11.9          hb806964_0_cpython    conda-forge

cmeesters commented 5 months ago

Hugh? For optimal functionality, you need to update to python >= 3.12. I am surprised, you do not see further issues with 3.11.

Anyway, thanks for the feedback!

paudano commented 5 months ago

Thanks for the heads-up, I was having some compatibility issues with other packages and 3.12 a couple months ago, but probably worth trying again.

snakemake / snakemake-executor-plugin-slurm

Inheriting cpus-per-task in srun (Slurm 22.05+) #41