nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
182 stars 115 forks source link

Run exits with exit status (137) even when --max_memory specified #526

Closed srikanthkris closed 1 year ago

srikanthkris commented 1 year ago

I am running ampliseq version 2.4.1 to analyze Illumina single end reads from 16s rRNA sequencing data. I have a total of 794 samples. The sever i am running the analysis has a memory of 125 GB .

I ran the following code

nextflow run nf-core/ampliseq -r 2.4.1 -profile docker --max_memory '100.GB' --input '/MyBookDuo/Kris/6.Milk_Microbiome/RUNS1_2_3/samplesheet.tsv' --single_end --multiple_sequencing_runs --FW_primer GGACTACHVGGGTWTCTAAT --RV_primer GTGCCAGCMGCCGCGGTAA --ignore_empty_input_files --skip_cutadapt --ignore_failed_trimming --trunclenf 0 --trunclenr 0 --trunc_rmin 0.75 --skip_fastqc --outdir "./results"

The process keeps getting killed with error code 137 at the NFCORE_AMPLISEQ:AMPLISEQ:DADA2_ADDSPECIES stage.

I understand that this is a running out of memory error, so I have included the --max_memory flag as well as tried including a config file limiting memory . But i keep getting the same error.

nextflow version 22.10.5 build 5840 ampliseq version 2.4.1

Any suggestions to surmount this issue ?

nextflow.log

d4straub commented 1 year ago

Hi there! Taxonomic classification is memory hungry. Illumina single end is by far too short for DADA2's add species function anyway, so without loosing much on the results just use --skip_dada_addspecies to avoid that process. edit: and do not forget adding -resume as well to not calculate all that again but rather continue where the pipeline failed before.

srikanthkris commented 1 year ago

Thanks. But just to understand. I thought --max_memory would control resource allocation according to user input. If so, why would dada still end up in error 137.

d4straub commented 1 year ago

--max_memory only limits the memory (i.e. defines a maximum), it does not grant it. To allow more memory per process, follow https://nf-co.re/ampliseq/2.4.1/usage#resource-requests

srikanthkris commented 1 year ago

Thank you very much. Sorry to keep bothering you on this. So through --max_memory and through a config file for a specific process we can only increase and not limit memory requirements . Is that right ?

d4straub commented 1 year ago

--max_memory puts a maximum memory value that will be respected by all processes. It can only decrease, but not increase memory for processes. Also see https://nf-co.re/ampliseq/2.4.1/parameters#max_memory A config file can ignore --max_memory and decrease or increase memory for either all or specific processes. It requires a little more knowledge to apply config files correctly, the link I sent you above should help for starters. Config files can be also used in nf-core pipelines to change output files/dirs or even change process parameters. Configs can be quite powerful, but can also break pipelines.

d4straub commented 1 year ago

Just to add here because I might have been not clear enough: Available memory for processes can be modified, but not the actual required memory for process to complete, i.e. when the ASV table is too large to fit into memory that is available to the process, it will fail. More memory has to me made available for the process. There isn't any way around that. I close that issue for now, please feel free to open another one if you come across more trouble!