snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.17k stars 521 forks source link

slurm job submission #2912

Open xl5525 opened 2 weeks ago

xl5525 commented 2 weeks ago

Snakemake version 8.13.0

Describe the bug default-resources: runtime: '24h' mem_mb_per_cpu: 15000 qos: '1day' cpus_per_task: 16 slurm_account: 'main'

Works

But when I set it as: default-resources: runtime: '24h' mem_mb: 240000 qos: '1day' cpus_per_task: 16 slurm_account: 'main'

It threw out configuration conficts.

Logs rule a: input: raw/a_R1.fastq, raw/a_R2.fastq output: bam/a.bam jobid: 0 reason: Forced execution wildcards: sample=a resources: mem_mb=8000, mem_mib=7630, disk_mb=50848, disk_mib=48493, tmpdir=, runtime=1440, nodes=1, mem_per_cpu=15000, qos=1day, cpus_per_task=16, slurm_account=main

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

gilmorera commented 2 weeks ago

I am also having the same issue!

[tool.poetry.dependencies] python = "^3.11" snakemake-executor-plugin-cluster-sync = "^0.1.4" snakemake = "^8.12.0"

gilmorera commented 2 weeks ago

This issue has also been documented on stackoverflow: https://stackoverflow.com/questions/78422411/snakemake-slurm-executor-causes-srun-fatal-unless-mem-mb-is-set-to-none

cmeesters commented 2 weeks ago

Please run with snakemake --verbose ...., paste your entire command line and the sbatch statement you see in your terminal. Thank you.

xl5525 commented 2 weeks ago
#!/bin/bash
# this file is snakemake.sh

snakemake -j 18 --keep-going --verbose \
    --jobs 10 --profile config --executor slurm \
    --latency-wait 120 --use-conda --conda-frontend mamba --conda-base-path "../mambaforge"

If config/config.v8+.yaml is default-resources: runtime: '24h' mem_mb: 15000 qos: '1day' cpus_per_task: 16 slurm_account: 'main' It will return

Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: True
Using shell: /usr/bin/bash
Provided remote nodes: 1
Provided resources: mem_mb=15000, mem_mib=14306, disk_mb=1000, disk_mib=954, cpus_per_task=16
Resources before job selection: {'mem_mb': 15000, 'mem_mib': 14306, 'disk_mb': 1000, 'disk_mib': 954, 'cpus_per_task': 16, '_cores': 9223372036854775807, '_nodes': 1}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Inferred runtime value of 1440 minutes from 24h
Selected jobs (1)
Resources after job selection: {'mem_mb': 15000, 'mem_mib': 0, 'disk_mb': 1000, 'disk_mib': 0, 'cpus_per_task': 0, '_cores': 9223372036854775806, '_nodes': 0}
Execute 1 jobs...

[Thu Jun 13 14:09:47 2024]
rule cooler:
    input: pairs/cg11504_1_output.pairs.gz, pairs/cg11504_1_output.pairs.gz.px2
    output: pairs/cg11504_1.cool
    jobid: 0
    reason: Forced execution
    wildcards: sample=cg11504_1
    resources: mem_mb=15000, mem_mib=14306, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, runtime=1440, qos=1day, cpus_per_task=16, slurm_account=main

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

If config/config.v8+.yaml is default-resources: runtime: '24h' mem_mb_per_cpu: 15000 qos: '1day' cpus_per_task: 16 slurm_account: 'main' It works fine.

cmeesters commented 2 weeks ago

Wait - how do you start snakemake.sh?

xl5525 commented 2 weeks ago

sbatch snakemake.sh

cmeesters commented 2 weeks ago

What happens, if you run snakemake ... and not the sbatch statement? The idea is: Snakemake submits and monitors your SLURM jobs. It submits itself per rule and displays progress on the main terminal.

xl5525 commented 2 weeks ago

Using profile config for setting default command line arguments. Building DAG of jobs... shared_storage_local_copies: True remote_exec: False SLURM run ID: 36940d04-e904-4fa8-916d-26ba4fb2c04b Using shell: /usr/bin/bash Provided remote nodes: 10 Job stats: job count


all 1 cooler 1 total 2

Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 10} Ready jobs (1) Select jobs to execute... Using greedy selector because only single job has to be scheduled. Inferred runtime value of 1440 minutes from 24h Selected jobs (1) Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 9} Execute 1 jobs...

[Thu Jun 13 14:09:27 2024] rule cooler: input: pairs/cg11504_1_output.pairs.gz, pairs/cg11504_1_output.pairs.gz.px2 output: pairs/cg11504_1.cool jobid: 8 reason: Missing output files: pairs/cg11504_1.cool wildcards: sample=cg11504_1 resources: mem_mb=15000, mem_mib=14306, disk_mb=1000, disk_mib=954, tmpdir=, runtime=1440, qos=1day, cpus_per_task=16, slurm_account=main

sbatch call: sbatch --job-name 36940d04-e904-4fa8-916d-26ba4fb2c04b --output .snakemake/slurm_logs/rule_cooler/cg11504_1/%j.log --export=ALL --comment rule_cooler_wildcards_cg11504_1 -A 'main' -p main -t 1440 --mem 15000 --ntasks=1 --cpus-per-task=16 -D /mambaforge/envs/snakemake8/bin/python3.12 -m snakemake --snakefile cg11504/Snakefile --target-jobs 'cooler:sample=cg11504_1' --allowed-rules 'cooler' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15000' 'mem_mib=14306' 'disk_mb=1000' 'disk_mib=954' 'cpus_per_task=16' --wait-for-files '/cg11504/.snakemake/tmp.syr1lpgg' 'pairs/cg11504_1_output.pairs.gz' 'pairs/cg11504_1_output.pairs.gz.px2' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code mtime params software-env input --deployment-method conda --conda-frontend mamba --conda-base-path /Genomics/argo/users/xl5525/mambaforge --shared-fs-usage storage-local-copies sources source-cache software-deployment persistence input-output --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --latency-wait 120 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /Genomics/argo/users/xl5525/mambaforge/envs/snakemake8/bin --default-resources base64//bWVtX21iPTE1MDAw base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//cnVudGltZT0yNGg= base64//cW9zPTFkYXk= base64//Y3B1c19wZXJfdGFzaz0xNg== base64//c2x1cm1fYWNjb3VudD1tYWlu --executor slurm-jobstep --jobs 1 --mode remote" Job 8 has been submitted with SLURM jobid 5153145 (log: /cg11504/.snakemake/slurm_logs/rule_cooler/cg11504_1/5153145.log). The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-06-11T14:00 --endtime now --name 36940d04-e904-4fa8-916d-26ba4fb2c04b It took: 0.09229755401611328 seconds The output is: '5153145|FAILED '

status_of_jobs after sacct is: {'5153145': 'FAILED'} active_jobs_ids_with_current_sacct_status are: {'5153145'} active_jobs_seen_by_sacct are: {'5153145'} missing_sacct_status are: set() [Thu Jun 13 14:10:07 2024] Error in rule cooler: message: SLURM-job '5153145' failed, SLURM status is: 'FAILED'. For further error details see the cluster/cloud log and the log files of the involved rule(s). jobid: 8 input: pairs/cg11504_1_output.pairs.gz, pairs/cg11504_1_output.pairs.gz.px2 output: pairs/cg11504_1.cool log: /cg11504/.snakemake/slurm_logs/rule_cooler/cg11504_1/5153145.log (check log file(s) for error details) shell:

    PATH=$PATH:/Genomics/argo/users/xl5525/mambaforge/bin/
    cooler cload pairix /dm6.chrom.sizes:100 pairs/
cmeesters commented 2 weeks ago

And? The reason is different, I hope. What does your log tell you?

xl5525 commented 2 weeks ago
#!/bin/bash
# this file is snakemake.sh

snakemake -j 18 --keep-going --verbose \
    --jobs 10 --profile config --executor slurm \
    --latency-wait 120 --use-conda --conda-frontend mamba --conda-base-path "../mambaforge"

If config/config.v8+.yaml is default-resources: runtime: '24h' mem_mb: 15000 qos: '1day' cpus_per_task: 16 slurm_account: 'main' It will return

Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: True
Using shell: /usr/bin/bash
Provided remote nodes: 1
Provided resources: mem_mb=15000, mem_mib=14306, disk_mb=1000, disk_mib=954, cpus_per_task=16
Resources before job selection: {'mem_mb': 15000, 'mem_mib': 14306, 'disk_mb': 1000, 'disk_mib': 954, 'cpus_per_task': 16, '_cores': 9223372036854775807, '_nodes': 1}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Inferred runtime value of 1440 minutes from 24h
Selected jobs (1)
Resources after job selection: {'mem_mb': 15000, 'mem_mib': 0, 'disk_mb': 1000, 'disk_mib': 0, 'cpus_per_task': 0, '_cores': 9223372036854775806, '_nodes': 0}
Execute 1 jobs...

[Thu Jun 13 14:09:47 2024]
rule cooler:
    input: pairs/cg11504_1_output.pairs.gz, pairs/cg11504_1_output.pairs.gz.px2
    output: pairs/cg11504_1.cool
    jobid: 0
    reason: Forced execution
    wildcards: sample=cg11504_1
    resources: mem_mb=15000, mem_mib=14306, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, runtime=1440, qos=1day, cpus_per_task=16, slurm_account=main

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

If config/config.v8+.yaml is default-resources: runtime: '24h' mem_mb_per_cpu: 15000 qos: '1day' cpus_per_task: 16 slurm_account: 'main' It works fine.

As posted above

cmeesters commented 2 weeks ago

Do not run sbatch <jobscript containing a snakemake command>: When Snakemake will submit jobs within job context, it will inherit the environment of the job, including all defaults, which can lead to unpredicted behaviour.

xl5525 commented 2 weeks ago

I see you point. Running a job several hours in front can be killed anytime with no reason, so I will simply use "mem_mb_per_cpu", which works fine for sbatch snakemake itself. It takes the minimum default for the snakemake.sh, and request resources for the jobs submitted by snakemake properly based on config file.

cmeesters commented 2 weeks ago

Contact your admins and kindly request to avoid killing processes without reason.

Snakemake, except for local rules like downloading or plotting, creates no load on the login or head nodes. It merely checks job states, and we took some precaution to minimize the load on the SLURM control demon on the management node, too. Just killing processes without reason, keeps you from doing your science - and they should be aware of it. While I understand my colleagues in principle because people tend to do weird things on login nodes, they should act within reason.

Very soon, Snakemake will refuse being started in job context, for the reason stated above. In fact, the behaviour can be different from cluster setup to cluster setup, depending on defaults, wrappers and plugins admins introduce locally. As developers, we can allow for so-called spank plugins or other peculiarities with generic arguments. But we cannot account for all possible quirks.

For really long-running workflows, I suggest that you start a terminal multiplexer like screen or tmux, start your workflow, detach, take a break (this life hack allows to log off, too), return, re-attach and evaluate when ready.

shashwatsahay commented 6 days ago

Do not run sbatch <jobscript containing a snakemake command>: When Snakemake will submit jobs within job context, it will inherit the environment of the job, including all defaults, which can lead to unpredicted behaviour.

Does this also mean that we shouldnt start snakemake jobs in form of job array.

Very soon, Snakemake will refuse being started in job context, for the reason stated above. In fact, the behaviour can be different from cluster setup to cluster setup, depending on defaults, wrappers and plugins admins introduce locally. As developers, we can allow for so-called spank plugins or other peculiarities with generic arguments. But we cannot account for all possible quirks.

I generally run snakemake in the form of a job array when I have to submit multiple samples for processing at the same time, in cluster setup this framework is helpful because genrally in my case atleast these workflows may take over 6 days to and the max time duration on the partition/queues in cluster setup is 7 days running a multi sample workflow hence become infeasible for me.

gilmorera commented 4 days ago

@cmeesters

Contact your admins and kindly request to avoid killing processes without reason.

Snakemake, except for local rules like downloading or plotting, creates no load on the login or head nodes. It merely checks job states, and we took some precaution to minimize the load on the SLURM control demon on the management node, too. Just killing processes without reason, keeps you from doing your science - and they should be aware of it. While I understand my colleagues in principle because people tend to do weird things on login nodes, they should act within reason.

Very soon, Snakemake will refuse being started in job context, for the reason stated above. In fact, the behaviour can be different from cluster setup to cluster setup, depending on defaults, wrappers and plugins admins introduce locally. As developers, we can allow for so-called spank plugins or other peculiarities with generic arguments. But we cannot account for all possible quirks.

For really long-running workflows, I suggest that you start a terminal multiplexer like screen or tmux, start your workflow, detach, take a break (this life hack allows to log off, too), return, re-attach and evaluate when ready.

While I understand the reasoning here, it would be super useful if there was a flag we could use to submit the main snakemake process to a separate job (vs using a terminal multiplexer). Most people I work with are using sbatch and qsub to submit the master snakemake process to the cluster as it's own job in the scheduler. If this is a separate topic, I'd be willing to open an issue for this as a feature request.

cmeesters commented 20 hours ago

With version 0.7.0 a warning is issued when attempting to submit in a job context. Please understand that we cannot ensure a working setup, when you try that. We cannot change the way SLURM works.

Most people I work with are using sbatch and qsub to submit the master snakemake process to the cluster as it's own job in the scheduler.

This means that they deprive themselves of some features and make their lives harder with respect to debugging their workflows. If you want to open a seperate issue on this topic, @gilmorera , you might want to emphasize why this is actually needed.

@shashwatsahay Frankly, I don't understand your argument. Might be me, but I read that you want to overcome performance issues by submitting Snakemake as jobs. If this is the case, you might investigate the reason for your performance issues and submit an issue for that case specifically.