maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
378 stars 85 forks source link

How is memory allocation propagated? #880

Closed half-adder closed 1 year ago

half-adder commented 1 year ago

I am trying to adjust memory allocation per-rule in the DNA-mapping pipeline and not having success (running on test data). It seems that no matter what I set the Bowtie2 job memory to, it uses 1GB.

In the DNA-mapping.cluster_config.yaml file, the snakemake_cmd parameter looks like this:

snakemake_cluster_cmd: sbatch --ntasks-per-node 1 -p bioinfo --mem-per-cpu {cluster.memory}
  -c {threads} -e cluster_logs/{rule}.%j.err -o cluster_logs/{rule}.%j.out -J {rule}.snakemake

I added a statement to the DNA-mapping script to print the snakemake command being run which is the following:

TMPDIR=/pine/scr/s/e/seanjohn/tmp/ PYTHONNOUSERSITE=True /nas/longleaf/home/seanjohn/.conda/envs/base_sj/envs/snakePipes/bin/snakemake --use-conda --conda-prefix /nas/longleaf/home/seanjohn/.conda/envs/base_sj/envs --rerun-incomplete --latency-wait 300 --snakefile /nas/longleaf/home/seanjohn/.conda/envs/base_sj/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/DNA-mapping/Snakefile --jobs 5 --directory /proj/mckaylab/users/sjohnsen/snakePipes_test/dna_mapping --configfile /proj/mckaylab/users/sjohnsen/snakePipes_test/dna_mapping/DNA-mapping.config.yaml --keep-going --cluster-config /proj/mckaylab/users/sjohnsen/snakePipes_test/dna_mapping/DNA-mapping.cluster_config.yaml --cluster 'sbatch --ntasks-per-node 1 -p bioinfo --mem-per-cpu {cluster.memory} -c {threads} -e cluster_logs/{rule}.%j.err -o cluster_logs/{rule}.%j.out -J {rule}.snakemake '

I edited the bowtie2 rule in DNA-mapping.cluster_config.yaml to allocate 8G of memory (note that I changed the lowercase b to uppercase to match the rule name in the snakefiles. I also tried leaving the b lowercase):

Bowtie2:
  memory: 8G

I also edited the __default__ rule in the cluster config file to allocate 5G of memory:

__default__:
  memory: 5G

When I run the DNA-mapping command, snakemake prints out the following info for the Bowtie2 job:

[Tue Jan 24 12:11:49 2023]
rule Bowtie2:
    input: FASTQ_Cutadapt/SRR6761497_R1.fastq.gz, FASTQ_Cutadapt/SRR6761497_R2.fastq.gz
    output: Bowtie2/SRR6761497.Bowtie2_summary.txt, Bowtie2/SRR6761497.sorted.bam
    log: Bowtie2/logs/SRR6761497.sort.log
    jobid: 40
    benchmark: Bowtie2/.benchmark/Bowtie2.SRR6761497.benchmark
    reason: Forced execution
    wildcards: sample=SRR6761497
    threads: 5
    resources: mem_mb=1000, disk_mb=1000, tmpdir=<TBD>

As you can see, the memory allocated seems to be 1G instead of 8G (as it should be if pulling from the Bowtie2 cluster config rule) or 5G (as it should be if pulling from the __default__ rule).

Am I missing something here? How should I be defining the memory allocation?

Also what is the point of --mem-per-cpu {cluster.memory}? Looking through the code I have not been able to determine how snakePipes/snakemake/SLURM populates that field.

katsikora commented 1 year ago

Hi half-adder,

so it looks like the memory assignment in DNA cluster_config.yaml should indeed read 'Bowtie2' with a capital B.

After you've changed the setting in the yaml, did you save a copy and rerun your DNA-mapping command with --clusterConfigFile modified_cluster_config.yaml ? Where did you change the _default_ memory value - in the shared cluster config? If so, than you might need to run 'snakePipes config' afterwards and provide this new config with --clusterConfig . If you changed it in the merged cluster config yaml found in your output folder, than please save a copy, and provide to your DNA-mapping command as above.

Memory is always assigned per cpu, such that your total memory usage for a rule is threads*mem-per-cpu. This variable is populated by snakemake, taking the "memory" asset from the cluster config, or using the default memory (1G) otherwise.

Let me know if this was clear and if it fixed your issue. We'll fix the misspelling on our side.

Best,

Katarzyna