metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
377 stars 98 forks source link

Advice on memory allocation #528

Closed skeffington closed 2 years ago

skeffington commented 2 years ago

Hi,

I'm just trying to get this amazing pipeline to run to completion on my data. I think I got the memory settings wrong in the config files. The snakemake pipeline failed with "Error in rule run_decontamination:"

The snakemake log file contains the following:

Touching output file M350b/sequence_quality_control/finished_QC.
[Thu May 12 17:48:03 2022]
Finished job 22.
14 of 216 steps (6%) done
[Fri May 13 07:38:34 2022]
Finished job 73.
15 of 216 steps (7%) done
Removing temporary output M350a/assembly/reads/QC.errorcorr_R2.fastq.gz.
Removing temporary output M350a/assembly/reads/QC.errorcorr_R1.fastq.gz.
Select jobs to execute...

rule run_spades:
    input: M350a/assembly/reads/QC.errorcorr.merged_R1.fastq.gz, M350a/assembly/reads/QC.errorcorr.merged_R2.fastq.gz, M350a/assembly/reads/QC.errorcorr.merged_me.fastq.gz
    output: M350a/assembly/contigs.fasta, M350a/assembly/scaffolds.fasta
    log: M350a/logs/assembly/spades.log
    jobid: 72
    benchmark: logs/benchmarks/assembly/spades/M350a.txt
    wildcards: sample=M350a
    threads: 8
    resources: mem_mb=250000, disk_mb=2481, tmpdir=/tmp, mem=250, time=48, time_min=2880

Error submitting jobscript (exit code 1):
Job can't be submitted
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

So I guess it didn't get the 250Gb it asked for. This is perhaps not surprising given my parameters (See below), but I need advice on how best to change them.

My ~/.config/snakemake/cluster/queues.tsv file is this:

# column names and units should be the same as in the key_mapping.yaml
# queue with lowest priority values is choosen first
queue   priority        threads mem_mb  time_min
pq      1       320     2560000 4320

Our main queue (pq) has 94 nodes, each with 2 CPUs, making 16 cores per node. Each node has 128Gb RAM. I wasn't sure exactly how the pipeline used this information, so I set much lower threads and mem_mb parameter than is available, to limit the number of jobs the scheduler tries to initiate at once.

My config.yaml file contains the following:

########################
# Execution parameters
########################
# threads and memory (GB) for most jobs especially from BBtools, which are memory demanding
threads: 8
mem: 60

# threads and memory for jobs needing high amount of memory. e.g GTDB-tk,checkm or assembly
large_mem: 250
large_threads: 8
assembly_threads: 8
assembly_memory: 250
simplejob_mem: 10
simplejob_threads: 4

#Runtime only for cluster execution
runtime: #in h
  default: 5
  assembly: 48
  long: 24
  simplejob: 1

So my questions are:

Any advice would be very much appreciated by a snakemake novice! Thanks, Alastair

Atlas version version 2.9.1

SilasK commented 2 years ago

Yes set max 128 for memory And yes you can simply restart.

You can set threads to 16 or lower.

skeffington commented 2 years ago

Thanks!