nf-core / denovotranscript

A pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq
https://nf-co.re/denovotranscript/
MIT License
7 stars 3 forks source link

Ask for speeding up trinity #12

Closed dppss90008 closed 1 day ago

dppss90008 commented 2 months ago

Description of feature

Hello,

Thank you for developing this fantastic pipeline. I am currently trying to use it to process my RNAseq data (human).

This is the command I used:

nextflow run nf-core/denovotranscript -resume  --max_memory 256.GB --max_cpus 40 --input ./samplesheet.csv --outdir ./results -profile docker

However, Nextflow raised the following error:

Sept-26 05:42:41.507 [TaskFinalizer-6] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_DENOVOTRANSCRIPT:DENOVOTRANSCRIPT:TRINITY (pooled_reads)'

Caused by:
  Process exceeded running time limit (16h)

Command executed:

  # Note that Trinity needs the word 'trinity' in the outdir

  Trinity \
      --seqType fq \
      --max_memory 160G \
      --left input1/pooled_reads_1.merged.fastq.gz --right input2/pooled_reads_2.merged.fastq.gz \
      --output pooled_reads_trinity \
      --CPU 12 \
       \
      > >(tee pooled_reads.log)

  gzip \
      -cf \
      pooled_reads_trinity.Trinity.fasta \
      > pooled_reads.fa.gz

I suspect the issue is caused by Trinity. I am wondering if using the following parameter might help:

--max_time 500.h

Are there any other parameters I can modify to improve performance? I noticed that Trinity is only using 12 CPUs and 160 GB of memory. Could I allocate more memory or CPUs to speed up the process?

Thank you

Chih-Hung, Hsieh (CH)

avani-bhojwani commented 1 month ago

Hello,

Thank you for your question. I can update the resources for this step for the next update of the pipeline.

In the meantime, you can edit conf/base.config. In that file, you can edit the resources for process_high_memory from this:

    withLabel:process_high_memory {
        memory = { 200.GB * task.attempt }
    }

to this (or more depending on your preference):

    withLabel:process_high_memory {
        cpus   = { 20    * task.attempt }
        memory = { 200.GB * task.attempt }
        time   = { 100.h  * task.attempt }
    }

Currently, the Trinity module uses 0.8 memory for process_high_memory (i.e. 0.8200 = 160GB).

Best, Avani

avani-bhojwani commented 1 day ago

I have increased the resources for this step to the following:

    withLabel:process_high_memory {
        cpus   = { 20     * task.attempt }
        memory = { 320.GB * task.attempt }
        time   = { 200.h  * task.attempt }
    }

Please let me know if that these work you and the size of datasets that are using (e.g. number of samples, number of reads/sample).

Thanks!