snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
18 stars 19 forks source link

`threads` directive of a rule not taken into account when running through slurm #141

Open blaiseli opened 2 months ago

blaiseli commented 2 months ago

Software Versions

$ pip list | grep snakemake
snakemake                                 8.19.0
snakemake-executor-plugin-cluster-generic 1.0.9
snakemake-executor-plugin-slurm           0.10.0
snakemake-executor-plugin-slurm-jobstep   0.2.1
snakemake-interface-common                1.17.3
snakemake-interface-executor-plugins      9.2.0
snakemake-interface-report-plugins        1.0.0
snakemake-interface-storage-plugins       3.3.0
$ sinfo --version
slurm 23.02.6

Describe the bug

In a rule having threads set to 2, a shell command built to display {threads} reports only 1 thread when snakemake is run through slurm using sbatch.

Minimal example

Here is a short example meant to compare the above with what happens when setting ncpus_per_task to 2 in resources.

$ cat src/workflow/test.smk
rule all:
    input:
        "test/test_threads.out",
        "test/test_resources.out"

rule test_threads:
    output: "test/test_threads.out"
    threads: 2
    run:
        cmd = f"echo {threads} > {output}"
        shell(cmd)

rule test_resources:
    output: "test/test_resources.out"
    resources:
        cpus_per_task = 2
    run:
        cmd = f"echo {resources.cpus_per_task} > {output}"
        shell(cmd)

I run it through sbatch using the following script:

$ cat src/run_test.sh
#!/bin/bash

source .venv/bin/activate

profile="src/profile/slurm"
snakefile="src/workflow/test.smk"
snakemake --version

mkdir -p test

cmd="snakemake -s ${snakefile} \
    --executor slurm
    --profile ${profile} \
    $@"

>&2 sbatch --qos="hubbioit" --partition="hubbioit" --parsable \
    -J run_test \
    --mem=10G \
    -o test/test.o \
    -e test/test.e \
    ${cmd}

exit 0

Running it:

$ ./src/run_test.sh
8.19.0
20715805

Looking at the output:

$ cat test/test_resources.out 
2
$ cat test/test_threads.out 
1

If I run the workflow without sbatch and slurm, both output files contain "2".

Additional context

This looks like something similar to what is described here: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/113#issuecomment-2299153497

However, if I understand correctly, this aspect of https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/113 is supposed to be solved by https://github.com/snakemake/snakemake-executor-plugin-slurm/pull/137 which is included in 0.10.0

In case this is relevant, here is the config.yaml of the slurm profile given to --profile:

$ cat src/profile/slurm/config.yaml
# Manually edited according to https://github.com/Snakemake-Profiles/slurm/issues/117#issuecomment-1906448548
cluster-generic-sidecar-cmd: "slurm-sidecar.py"
#cluster-sidecar: "slurm-sidecar.py"
#cluster-cancel: "scancel"
cluster-generic-cancel-cmd: "scancel"
restart-times: "3"
jobscript: "slurm-jobscript.sh"
#cluster: "slurm-submit.py"
cluster-generic-submit-cmd: "slurm-submit.py"
#cluster-status: "slurm-status.py"
cluster-generic-status-cmd: "slurm-status.py"
max-jobs-per-second: "10"
max-status-checks-per-second: "10"
local-cores: 1
latency-wait: "240"
use-conda: "False"
use-singularity: "False"
jobs: "144"
printshellcmds: "False"
# end with comments only
CarstenBaker commented 2 months ago

137 sadly never worked to fix the issue for us

If you try running the sbatch command with 2 (or greater) cpu's specified (assume your default is 1 cpu) both threads should be 2.

We have been setting both cpus_per_task and threads in the slurm config file (matching the totals) for the moment as a workaround. It's a bit of a duplication but can't find a reliable way to link the totals together. As long as the cpus_per_task are the same or greater than threads than the threads command works for the correct total. You need to specify both as snakemake rules run on threads, the dry run totals are also incorrect so have to check the logs or slurm logs for correct threads/totals.

If you run from head node the threads will work correctly but we don't like doing this and prefer using sbatch.

tdido commented 2 months ago

@blaiseli I think your --profile may be interfering with the plugin. Did you try running without it? Or at least removing everything above the max-jobs-per-second: "10" line.

blaiseli commented 1 month ago

@tdido I just saw you suggestion and tried not using --profile on a workflow I'm currently setting up, and this didn't work.