I'm just trying to get this amazing pipeline to run to completion on my data. I think I got the memory settings wrong in the config files. The snakemake pipeline failed with "Error in rule run_decontamination:"
The snakemake log file contains the following:
Touching output file M350b/sequence_quality_control/finished_QC.
[Thu May 12 17:48:03 2022]
Finished job 22.
14 of 216 steps (6%) done
[Fri May 13 07:38:34 2022]
Finished job 73.
15 of 216 steps (7%) done
Removing temporary output M350a/assembly/reads/QC.errorcorr_R2.fastq.gz.
Removing temporary output M350a/assembly/reads/QC.errorcorr_R1.fastq.gz.
Select jobs to execute...
rule run_spades:
input: M350a/assembly/reads/QC.errorcorr.merged_R1.fastq.gz, M350a/assembly/reads/QC.errorcorr.merged_R2.fastq.gz, M350a/assembly/reads/QC.errorcorr.merged_me.fastq.gz
output: M350a/assembly/contigs.fasta, M350a/assembly/scaffolds.fasta
log: M350a/logs/assembly/spades.log
jobid: 72
benchmark: logs/benchmarks/assembly/spades/M350a.txt
wildcards: sample=M350a
threads: 8
resources: mem_mb=250000, disk_mb=2481, tmpdir=/tmp, mem=250, time=48, time_min=2880
Error submitting jobscript (exit code 1):
Job can't be submitted
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
So I guess it didn't get the 250Gb it asked for. This is perhaps not surprising given my parameters (See below), but I need advice on how best to change them.
My ~/.config/snakemake/cluster/queues.tsv file is this:
# column names and units should be the same as in the key_mapping.yaml
# queue with lowest priority values is choosen first
queue priority threads mem_mb time_min
pq 1 320 2560000 4320
Our main queue (pq) has 94 nodes, each with 2 CPUs, making 16 cores per node. Each node has 128Gb RAM. I wasn't sure exactly how the pipeline used this information, so I set much lower threads and mem_mb parameter than is available, to limit the number of jobs the scheduler tries to initiate at once.
My config.yaml file contains the following:
########################
# Execution parameters
########################
# threads and memory (GB) for most jobs especially from BBtools, which are memory demanding
threads: 8
mem: 60
# threads and memory for jobs needing high amount of memory. e.g GTDB-tk,checkm or assembly
large_mem: 250
large_threads: 8
assembly_threads: 8
assembly_memory: 250
simplejob_mem: 10
simplejob_threads: 4
#Runtime only for cluster execution
runtime: #in h
default: 5
assembly: 48
long: 24
simplejob: 1
So my questions are:
So should I set large_mem and assembly_memorey in config.yaml to 128?
Do I need to also then reduce the number of threads in config.yaml
If I adjust the config files and simply restart "atlas run all" will the new parameters be incorporated? Or do I need to start again from scratch?
Any advice would be very much appreciated by a snakemake novice!
Thanks,
Alastair
Hi,
I'm just trying to get this amazing pipeline to run to completion on my data. I think I got the memory settings wrong in the config files. The snakemake pipeline failed with "Error in rule run_decontamination:"
The snakemake log file contains the following:
So I guess it didn't get the 250Gb it asked for. This is perhaps not surprising given my parameters (See below), but I need advice on how best to change them.
My ~/.config/snakemake/cluster/queues.tsv file is this:
Our main queue (pq) has 94 nodes, each with 2 CPUs, making 16 cores per node. Each node has 128Gb RAM. I wasn't sure exactly how the pipeline used this information, so I set much lower threads and mem_mb parameter than is available, to limit the number of jobs the scheduler tries to initiate at once.
My config.yaml file contains the following:
So my questions are:
Any advice would be very much appreciated by a snakemake novice! Thanks, Alastair
Atlas version version 2.9.1