Open kwells4 opened 7 months ago
However, I don't know where the mem_mb is being passed.
Your requirement should be translated to mem_mb
and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates in sbatch --mem ...
. And indeed, within SLURM --mem
and --mem-per-cpu
are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with --verbose
and attach the output as a file. Also, please state your SLURM version (output of sinfo --version
). Thank you.
PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!
Thanks for helping with this!
However, I don't know where the mem_mb is being passed.
Your requirement should be translated to
mem_mb
and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates insbatch --mem ...
. And indeed, within SLURM--mem
and--mem-per-cpu
are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with--verbose
and attach the output as a file. Also, please state your SLURM version (output ofsinfo --version
). Thank you.
The slurm version is 23.02.2
Here's the output using --verbose
from the master:
snakemake --snakefile Snakefile --configfile config.yaml --jobs 12 --latency-wait 60 --rerun-incomplete --use-singularity --workflow-profile profiles/default --verbose
Using workflow specific profile profiles/default for setting default command line arguments.
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: False
SLURM run ID: c846e871-127b-46a2-a3c5-559bfafd7f06
Using shell: /bin/bash
Provided remote nodes: 12
Job stats:
job count
-------------- -------
all 1
fastqc 1
fastqc_summary 1
total 3
Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 12} Ready jobs (1) Select jobs to execute... Using greedy selector because only single job has to be scheduled. Selected jobs (1) Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 11} Execute 1 jobs...
[Mon Apr 22 08:49:54 2024]
rule fastqc:
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
jobid: 2
reason: Missing output files: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=
General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign ore-incomplete', '', '--verbose ', '--rerun-triggers code software-env mtime params input', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr efix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache', '', '', '--shared-fs-usage source-cache sources storage-local-copies persistence software-deployment input-output', '', '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config .yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/kwellswrasman@xsede.org/software/anaconda/envs /snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA==', ''] sbatch call: sbatch --job-name c846e871-127b-46a2-a3c5-559bfafd7f06 --output /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule fastqc//pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/%j.log --export=ALL --comment fastqc -A amc-general -p acompile -t 60 --mem 15259 --ntasks=1 --cpus-per-task=1 --output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile -D /pl/active/Anschu tz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm --wrap="/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/ active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snake make8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305' --wait-for-files '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/tmp.ch3f265w' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/test ing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R2.fastq.gz' --force --target- files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code software- env mtime params input --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache --shared-fs-usage source-cache sources storage-local-copies persistence software-deployment input-output --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active/Anschutz_BDC/ analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /projects/kwell swrasman@xsede.org/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== ba se64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ == base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --executor slurm-jobstep --jobs 1 --mode remote" Job 2 has been submitted with SLURM jobid 5783166 (log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rulefastqc//pl/active/A nschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/5783166.log). The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-04-20T08:00 --endtime now --name c846e871-127b-46a2-a3c5-559bfafd7f0 6 It took: 0.058480262756347656 seconds The output is: '5783166|FAILED '
status_of_jobs after sacct is: {'5783166': 'FAILED'} active_jobs_ids_with_current_sacct_status are: {'5783166'} active_jobs_seen_by_sacct are: {'5783166'} missing_sacct_status are: set() [Mon Apr 22 08:50:34 2024] Error in rule fastqc: message: SLURM-job '5783166' failed, SLURM status is: 'FAILED'For further error details see the cluster/cloud log and the log files of the involved rule(s). jobid: 2 input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rulefastqc//pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testi ng_snakemake8_slurm/results_control_1/5783166.log (check log file(s) for error details) shell:
mkdir -p /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes
ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim for dir in /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel lswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip; do name=$(basename -s .zip $dir)
unzip -p $dir $name/summary.txt >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1
_summary_untrimmed.txt done
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 5783166
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-04-22T084953.851472.snakemake.log unlocking removing lock removing lock removed all locks Full Traceback (most recent call last): File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api dag_api.execute_workflow( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow workflow.execute( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute raise WorkflowError("At least one job did not complete successfully.") snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.
WorkflowError: At least one job did not complete successfully. raw_data/control_1_R1.fastq.gz
And the output from the job
Building DAG of jobs... shared_storage_local_copies: True remote_exec: True Using shell: /bin/bash Provided remote nodes: 1 Provided resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305 Resources before job selection: {'mem_mb': 15259, 'mem_mib': 7630, 'disk_mb': 43311, 'disk_mib': 41305, '_cores': 9223372036854775807, '_nodes': 1} Ready jobs (1) Select jobs to execute... Using greedy selector because only single job has to be scheduled. Selected jobs (1) Resources after job selection: {'mem_mb': 15259, 'mem_mib': 0, 'disk_mb': 43311, 'disk_mib': 0, '_cores': 9223372036854775806, '_nodes': 0} Execute 1 jobs...
[Mon Apr 22 08:50:04 2024]
rule fastqc:
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
jobid: 0
reason: Forced execution
wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=
General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign ore-incomplete', '', '--verbose ', '--rerun-triggers code mtime input params software-env', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr efix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache', '', '', '--shared-fs-usage sources source-cache software-deployment input-output persistence storage-local-copies', '', '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config .yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/kwellswrasman@xsede.org/software/anaconda/envs /snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA=='] This job is a group job: False The call for this job is: srun -n1 --cpu-bind=q --cpus-per-task 1 /projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/acti ve/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake 8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305' --f orce --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code mtime input params software-env --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache --shared-fs-usage sources source-cache software-deployment input-output persistence storage-local-copies --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active /Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path / projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVud GltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXpl X21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --mode remote Job is running on host: c3cpu-a2-u32-1.rc.int.colorado.edu srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive. [Mon Apr 22 08:50:04 2024] Error in rule fastqc: jobid: 0 input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt shell:
mkdir -p /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes
ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim for dir in /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel lswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip; do name=$(basename -s .zip $dir)
unzip -p $dir $name/summary.txt >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1
_summary_untrimmed.txt done
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Storing output in storage. Full Traceback (most recent call last): File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api dag_api.execute_workflow( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow workflow.execute( File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute raise WorkflowError("At least one job did not complete successfully.") snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.
WorkflowError: At least one job did not complete successfully.
PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!
I could definitely do that! I'll add it to my todo list!
Bad news: I cannot reproduce this behaviour. edit: My SLURM version is 23.02.7.
I noticed that you are overwriting of --output
by slurm_extra
. It does not produce the error. Yet, we have our log file at os.path.abspath(f".snakemake/slurm_logs/{group_or_rule}/{wildcard_str}/%j.log")
as gets reported by the plugin.
My Snakefile is
rule all:
input: "results/2.out"
rule test1:
output: "results/2.out"
#threads: 2
resources:
cpus_per_task=2,
slurm_extra="--output='somewhere_%j.log'"
shell: "touch results/$SLURM_CPUS_PER_TASK.out"
My profile:
default-resources:
slurm_partition: "smallcpu"
slurm_account: "nhr-zdvhpc" #"m2_zdvhpc"
set-resources:
test1:
runtime: 5
mem_mb: 1800
Does this produce the observed error, too?
That's unfortunate that you can't reproduce it.
You are completely correct, using your profile (changing the partition and account) and your Snakefile
I get the same error. But the error still occurs when I remove the slurm_extra
argument so it doesn't seem to be coming from overwriting --output
.
... I get the same error.
That is not what I wanted to read ;-)
Assuming you have this script:
#!/bin/bash
#SBATCH --mem 100
#SBATCH -A amc-general
#SBATCH -p acompile
#SBATCH -t 5
srun echo "Hello world"
and you run sbatch <this script>
. Does your SLURM output contain the error, too? I mean, we observe the call to srun
in the jobstep executor to NOT include any memory setting, and weirdly you still see this error.
You are good, that produced the exact same error. Seems to be an issue with my system and not snakemake
(probably what you did want to hear!)
I'll reach out to our system administrators. Thank you so much for all of your help!
probably what you did want to hear
Not really. It is some sort of relief, though. I know that it takes effort to update SLURM, if my colleagues are bitten by a bug — but then again, I would be surprised if you are the first to report.
Thanks for the feedback. I will keep this issue open, if you don't mind, and await further feedback. Perhaps, it turns out to be a corner case, we can mitigate.
Sounds great, we are working on it and have so far figured out that this works
#!/bin/bash
#SBATCH --mem 100
#SBATCH -A amc-general
#SBATCH -p acompile
#SBATCH -t 5
srun --mem 100 echo "Hello world"
I will let you know if we make any progress.
urgh, is redundancy a new hobby of SchedMD or is there a technical reason behind it (just a rhetorical question!)? I need to check a couple (read: two, for I do not have more and ask colleagues to do the same) of SLURM versions when I contribute the duplication into the code. I am not sure whether or where there might be side effects.
Also, as “my” most current version of SLURM is slightly more up to date than yours, I have to presume, that this is a quirk of your cluster.
This is likely a quirk of my cluster. We will definitely keep working on our side to see if there are good fixes.
Again, thanks so much for your help!
I might have found the problem... Our cluster is currently going through some growing pains so the best way to get an interactive job is by staring an interactive vscode session. When I submit the
snakemakejobs from within the interactive
vscode` session I get the error, but I don't when submitting from a normal interactive node.
So the slurm integration seems to work well as long as I'm not running through vscode
.
Ah, the issue is that you submit whilst working within job context. I'm afraid, that's not what we designed the plugin for. It should not be an issue either, at least that issue of yours should not arise.
Now, we can certainly detect this and program a fat warning. I wonder, however, whether falling back on the actual SLURM executor instead of the jobstep executor is possible as a reaction. Either way, I will keep this issue open until I have an answer to this question.
Versions
snakemake
version 8.10.7snakemake-executor-plugin-slurm
version 0.4.4snakemake-executor-plugin-slurm-jobstep
version 0.2.1The problem
I am working on getting
snakemake
version 8 to work on my slurm server and keep getting the following error:I can see that two resource arguments are being passed when looking at the rule description:
However, I don't know where the
mem_mb
is being passed.Profile
My rule
my command
Attempted fix 1: use
mem_mb
I have also tried this using the
mem_mb
argument insteadI get the same error, but the double memory request is less obvious
Attempted fix 2 remove profile and specify within rule
I have also tried this where I deleted my profile and just assigned the resources within the rule:
Submit with:
But this also fails with the same srun error:
Attempted fix 3 - submit with sbatch
I've also submitted the job per this issue but that gave the same error as above.
Conclusion
mem_mb
is obviously specified somewhere but I am not sure where to look beyond the profile, rules, andsnakemake
command. Do you have any ideas what I may be missing? Thanks so much for your help!