snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
9 stars 13 forks source link

Setting memory twice when submitting with slurm executor #75

Open kwells4 opened 2 months ago

kwells4 commented 2 months ago

Versions

snakemake version 8.10.7 snakemake-executor-plugin-slurm version 0.4.4 snakemake-executor-plugin-slurm-jobstep version 0.2.1

The problem

I am working on getting snakemake version 8 to work on my slurm server and keep getting the following error:

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

I can see that two resource arguments are being passed when looking at the rule description:

[Fri Apr 19 13:42:50 2024]
rule fastqc:
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/co
ntrol_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    jobid: 0
    reason: Forced execution
    wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
    resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_BDC/analysis/wells/analysis/we
lls/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_
control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB

However, I don't know where the mem_mb is being passed.

Profile

executor: slurm

default-resources:
    slurm_partition: "acompile"
    slurm_account:   "amc-general"

set-resources:
    fastqc:
        runtime: 60 # 1 hour
        mem: "16GB"
    fastqc_summary:
        runtime: 10
        mem: "4GB"

My rule

rule fastqc:
    input:
        input_list = _get_input
    output:
        file = "{results}/fastqc_pre_trim/fastqc_{sample}_summary_untrimmed.txt"
    params:
        output_dir  = os.path.join(RESULTS2, "fastqc_pre_trim"),
        directories = _get_directories
    resources:
        slurm_extra=lambda wildcards: (
            f"--output={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.out "
            f"--error={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.err "
            f"--qos=compile"
        )
    singularity:
       GENERAL_CONTAINER
    shell:
        """
        mkdir -p {params.output_dir}
        fastqc {input} --outdir {params.output_dir}
        for dir in {params.directories};
        do
            name=$(basename -s .zip $dir)

            unzip -p $dir $name/summary.txt \
                >> {output}
        done
        """    

my command

snakemake \
    --snakefile Snakefile \
    --configfile config.yaml \
    --jobs 12 \
    --latency-wait 60 \
    --rerun-incomplete \
    --use-singularity \
    --workflow-profile profiles/default

Attempted fix 1: use mem_mb

I have also tried this using the mem_mb argument instead

executor: slurm

default-resources:
    slurm_partition: "acompile"
    slurm_account:   "amc-general"

set-resources:
    fastqc:
        runtime: 60 # 1 hour
        mem_mb: 1600
    fastqc_summary:
        runtime: 10
        mem_mb: 4000

I get the same error, but the double memory request is less obvious

[Fri Apr 19 13:40:42 2024]
rule fastqc:
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/co
ntrol_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    jobid: 0
    reason: Forced execution
    wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
    resources: mem_mb=1600, mem_mib=1526, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wel
ls/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_c
ontrol_1_untrimmed.err --qos=compile, runtime=60

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

Attempted fix 2 remove profile and specify within rule

I have also tried this where I deleted my profile and just assigned the resources within the rule:

rule fastqc:
    input:
        input_list = _get_input
    output:
        file = "{results}/fastqc_pre_trim/fastqc_{sample}_summary_untrimmed.txt"
    resources:
        job_name="fastqc",
        mem_mb=1600,
        runtime=60,
        slurm_extra=lambda wildcards: (
            f"--output={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.out "
            f"--error={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.err "
            f"--qos=compile"
        )
    params:
        output_dir  = os.path.join(RESULTS2, "fastqc_pre_trim"),
        directories = _get_directories
    singularity:
       GENERAL_CONTAINER
    shell:
        """
        mkdir -p {params.output_dir}
        fastqc {input} --outdir {params.output_dir}
        for dir in {params.directories};
        do
            name=$(basename -s .zip $dir)

            unzip -p $dir $name/summary.txt \
                >> {output}
        done
        """   

Submit with:

snakemake \
    --snakefile Snakefile \
    --configfile config.yaml \
    --jobs 12 \
    --latency-wait 60 \
    --rerun-incomplete \
    --use-singularity \
    --executor slurm \
    --default-resources slurm_account=amc-general slurm_partition=acompile

But this also fails with the same srun error:

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

Attempted fix 3 - submit with sbatch

I've also submitted the job per this issue but that gave the same error as above.

Conclusion

mem_mb is obviously specified somewhere but I am not sure where to look beyond the profile, rules, and snakemake command. Do you have any ideas what I may be missing? Thanks so much for your help!

cmeesters commented 2 months ago

However, I don't know where the mem_mb is being passed.

Your requirement should be translated to mem_mb and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates in sbatch --mem .... And indeed, within SLURM --mem and --mem-per-cpu are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with --verbose and attach the output as a file. Also, please state your SLURM version (output of sinfo --version). Thank you.

cmeesters commented 2 months ago

PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!

kwells4 commented 2 months ago

Thanks for helping with this!

However, I don't know where the mem_mb is being passed.

Your requirement should be translated to mem_mb and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates in sbatch --mem .... And indeed, within SLURM --mem and --mem-per-cpu are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with --verbose and attach the output as a file. Also, please state your SLURM version (output of sinfo --version). Thank you.

Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 12} Ready jobs (1) Select jobs to execute... Using greedy selector because only single job has to be scheduled. Selected jobs (1) Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 11} Execute 1 jobs...

[Mon Apr 22 08:49:54 2024] rule fastqc: input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt jobid: 2 reason: Missing output files: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1 resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=, slurm_partition=acompile, slurm_account=amc-general, slurmextra=--output=/pl/active/Anschutz BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/t esting_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB

General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign ore-incomplete', '', '--verbose ', '--rerun-triggers code software-env mtime params input', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr efix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache', '', '', '--shared-fs-usage source-cache sources storage-local-copies persistence software-deployment input-output', '', '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config .yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/kwellswrasman@xsede.org/software/anaconda/envs /snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA==', ''] sbatch call: sbatch --job-name c846e871-127b-46a2-a3c5-559bfafd7f06 --output /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule fastqc//pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/%j.log --export=ALL --comment fastqc -A amc-general -p acompile -t 60 --mem 15259 --ntasks=1 --cpus-per-task=1 --output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile -D /pl/active/Anschu tz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm --wrap="/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/ active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snake make8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305' --wait-for-files '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/tmp.ch3f265w' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/test ing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R2.fastq.gz' --force --target- files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code software- env mtime params input --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache --shared-fs-usage source-cache sources storage-local-copies persistence software-deployment input-output --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active/Anschutz_BDC/ analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /projects/kwell swrasman@xsede.org/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== ba se64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ == base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --executor slurm-jobstep --jobs 1 --mode remote" Job 2 has been submitted with SLURM jobid 5783166 (log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rulefastqc//pl/active/A nschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/5783166.log). The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-04-20T08:00 --endtime now --name c846e871-127b-46a2-a3c5-559bfafd7f0 6 It took: 0.058480262756347656 seconds The output is: '5783166|FAILED '

status_of_jobs after sacct is: {'5783166': 'FAILED'} active_jobs_ids_with_current_sacct_status are: {'5783166'} active_jobs_seen_by_sacct are: {'5783166'} missing_sacct_status are: set() [Mon Apr 22 08:50:34 2024] Error in rule fastqc: message: SLURM-job '5783166' failed, SLURM status is: 'FAILED'For further error details see the cluster/cloud log and the log files of the involved rule(s). jobid: 2 input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rulefastqc//pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testi ng_snakemake8_slurm/results_control_1/5783166.log (check log file(s) for error details) shell:

    mkdir -p /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
    fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes

ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim for dir in /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel lswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip; do name=$(basename -s .zip $dir)

        unzip -p $dir $name/summary.txt                 >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1

_summary_untrimmed.txt done

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 5783166

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-04-22T084953.851472.snakemake.log unlocking removing lock removing lock removed all locks Full Traceback (most recent call last): File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api dag_api.execute_workflow( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow workflow.execute( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute raise WorkflowError("At least one job did not complete successfully.") snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.

WorkflowError: At least one job did not complete successfully. raw_data/control_1_R1.fastq.gz


And the output from the job

Building DAG of jobs... shared_storage_local_copies: True remote_exec: True Using shell: /bin/bash Provided remote nodes: 1 Provided resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305 Resources before job selection: {'mem_mb': 15259, 'mem_mib': 7630, 'disk_mb': 43311, 'disk_mib': 41305, '_cores': 9223372036854775807, '_nodes': 1} Ready jobs (1) Select jobs to execute... Using greedy selector because only single job has to be scheduled. Selected jobs (1) Resources after job selection: {'mem_mb': 15259, 'mem_mib': 0, 'disk_mb': 43311, 'disk_mib': 0, '_cores': 9223372036854775806, '_nodes': 0} Execute 1 jobs...

[Mon Apr 22 08:50:04 2024] rule fastqc: input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt jobid: 0 reason: Forced execution wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1 resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=, slurm_partition=acompile, slurm_account=amc-general, slurmextra=--output=/pl/active/Anschutz BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/t esting_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB

General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign ore-incomplete', '', '--verbose ', '--rerun-triggers code mtime input params software-env', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr efix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache', '', '', '--shared-fs-usage sources source-cache software-deployment input-output persistence storage-local-copies', '', '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config .yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/kwellswrasman@xsede.org/software/anaconda/envs /snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA=='] This job is a group job: False The call for this job is: srun -n1 --cpu-bind=q --cpus-per-task 1 /projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/acti ve/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake 8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305' --f orce --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code mtime input params software-env --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache --shared-fs-usage sources source-cache software-deployment input-output persistence storage-local-copies --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active /Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path / projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVud GltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXpl X21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --mode remote Job is running on host: c3cpu-a2-u32-1.rc.int.colorado.edu srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive. [Mon Apr 22 08:50:04 2024] Error in rule fastqc: jobid: 0 input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt shell:

    mkdir -p /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
    fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes

ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim for dir in /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel lswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip; do name=$(basename -s .zip $dir)

        unzip -p $dir $name/summary.txt                 >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1

_summary_untrimmed.txt done

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Storing output in storage. Full Traceback (most recent call last): File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api dag_api.execute_workflow( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow workflow.execute( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute raise WorkflowError("At least one job did not complete successfully.") snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.

WorkflowError: At least one job did not complete successfully.

kwells4 commented 2 months ago

PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!

I could definitely do that! I'll add it to my todo list!

cmeesters commented 2 months ago

Bad news: I cannot reproduce this behaviour. edit: My SLURM version is 23.02.7.

I noticed that you are overwriting of --output by slurm_extra. It does not produce the error. Yet, we have our log file at os.path.abspath(f".snakemake/slurm_logs/{group_or_rule}/{wildcard_str}/%j.log") as gets reported by the plugin.

My Snakefile is

 rule all:
     input: "results/2.out"

rule test1:
     output: "results/2.out"
     #threads: 2
     resources:
        cpus_per_task=2,
        slurm_extra="--output='somewhere_%j.log'"
     shell: "touch results/$SLURM_CPUS_PER_TASK.out"

My profile:

default-resources:
    slurm_partition: "smallcpu"
    slurm_account: "nhr-zdvhpc" #"m2_zdvhpc"

set-resources:
    test1:
        runtime: 5
        mem_mb: 1800

Does this produce the observed error, too?

kwells4 commented 2 months ago

That's unfortunate that you can't reproduce it.

You are completely correct, using your profile (changing the partition and account) and your Snakefile I get the same error. But the error still occurs when I remove the slurm_extra argument so it doesn't seem to be coming from overwriting --output.

cmeesters commented 2 months ago

... I get the same error.

That is not what I wanted to read ;-)

Assuming you have this script:

#!/bin/bash

#SBATCH --mem 100
#SBATCH -A amc-general 
#SBATCH -p acompile
#SBATCH -t 5

srun echo "Hello world"

and you run sbatch <this script> . Does your SLURM output contain the error, too? I mean, we observe the call to srun in the jobstep executor to NOT include any memory setting, and weirdly you still see this error.

kwells4 commented 2 months ago

You are good, that produced the exact same error. Seems to be an issue with my system and not snakemake (probably what you did want to hear!)

I'll reach out to our system administrators. Thank you so much for all of your help!

cmeesters commented 2 months ago

probably what you did want to hear

Not really. It is some sort of relief, though. I know that it takes effort to update SLURM, if my colleagues are bitten by a bug — but then again, I would be surprised if you are the first to report.

Thanks for the feedback. I will keep this issue open, if you don't mind, and await further feedback. Perhaps, it turns out to be a corner case, we can mitigate.

kwells4 commented 2 months ago

Sounds great, we are working on it and have so far figured out that this works

#!/bin/bash

#SBATCH --mem 100
#SBATCH -A amc-general 
#SBATCH -p acompile
#SBATCH -t 5

srun --mem 100 echo "Hello world"

I will let you know if we make any progress.

cmeesters commented 2 months ago

urgh, is redundancy a new hobby of SchedMD or is there a technical reason behind it (just a rhetorical question!)? I need to check a couple (read: two, for I do not have more and ask colleagues to do the same) of SLURM versions when I contribute the duplication into the code. I am not sure whether or where there might be side effects.

Also, as “my” most current version of SLURM is slightly more up to date than yours, I have to presume, that this is a quirk of your cluster.

kwells4 commented 2 months ago

This is likely a quirk of my cluster. We will definitely keep working on our side to see if there are good fixes.

Again, thanks so much for your help!

kwells4 commented 2 months ago

I might have found the problem... Our cluster is currently going through some growing pains so the best way to get an interactive job is by staring an interactive vscode session. When I submit thesnakemakejobs from within the interactivevscode` session I get the error, but I don't when submitting from a normal interactive node.

So the slurm integration seems to work well as long as I'm not running through vscode.

cmeesters commented 2 months ago

Ah, the issue is that you submit whilst working within job context. I'm afraid, that's not what we designed the plugin for. It should not be an issue either, at least that issue of yours should not arise.

Now, we can certainly detect this and program a fat warning. I wonder, however, whether falling back on the actual SLURM executor instead of the jobstep executor is possible as a reaction. Either way, I will keep this issue open until I have an answer to this question.