Increase memory allocation for BEDTOOLS_GENOMECOV

SPPearce commented 10 months ago

Description of feature

The BEDTOOLS_GENOMECOV step can require a lot of memory when working on merged replicates. In my case (GRCh38), this takes 80-120Gb in total. The module runs bedtools genomecov, followed by sorting. This sorting step doesn't actually change anything for me, as the bedtools genomecov output is already sorted. I suspect this is always the case, as the input for genomecov must be sorted. We could also potentially change the tag here, but the tool is not multithreaded so additional cpus are not helpful, and process_high_memory defaults to excessively large amounts of memory.

SPPearce commented 10 months ago

Scratch that, I've just tested and hit the following error on a different sample: CDX18_I_REP3.mLb.clN.bedGraph is not case-sensitive sorted at line 23978872. Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C, or bedSort and try again. So sorting is required. Suggest that we just increase the memory allocation for this process, at least when replicates are being merged.

Freeda2023 commented 5 months ago

Hi. I always get error at this step and while I even have access to 256GB of memory on cluster, my pipeline exits with 137 error. I have huge files (starting from fastqc with each of being 15GB) and reference genome of GRCM38. I was wondering if there is an option to fix this, or if I can do anything to fix this on my end. Here is my error:

ERROR ~ Error executing process > 'NFCORE_ATACSEQ:ATACSEQ:MERGED_REPLICATE_BAM_TO_BIGWIG:BEDTOOLS_GENOMECOV (CONTROL)'

Caused by:
  Process `NFCORE_ATACSEQ:ATACSEQ:MERGED_REPLICATE_BAM_TO_BIGWIG:BEDTOOLS_GENOMECOV (CONTROL)` terminated with an error exit status (137)

Command executed:

  SCALE_FACTOR=$(grep '[0-9] mapped (' CONTROL.mRp.clN.sorted.bam.flagstat | awk '{print 1000000/$1}')
  echo $SCALE_FACTOR > CONTROL.mRp.clN.scale_factor.txt

  bedtools \
      genomecov \
      -ibam CONTROL.mRp.clN.sorted.bam \
      -bg \
      -scale $SCALE_FACTOR \
      -pc \
       \
  > tmp.bg

  bedtools sort -i tmp.bg > CONTROL.mRp.clN.bedGraph

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_ATACSEQ:ATACSEQ:MERGED_REPLICATE_BAM_TO_BIGWIG:BEDTOOLS_GENOMECOV":
      bedtools: $(bedtools --version | sed -e "s/bedtools v//g")
  END_VERSIONS

Command exit status:
  137

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  WARNING: Skipping mount /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/apptainer/1.2.4/var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  .command.sh: line 14:  8361 Killed                  bedtools sort -i tmp.bg > CONTROL.mRp.clN.bedGraph

Work dir:
  /lustre06/project/6067517/work/27/234fbc9a842535b0adae076077714f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

SPPearce commented 5 months ago

Have you tried increasing the memory for this process with an additional config file? Not just increasing the max memory overall.

Freeda2023 commented 5 months ago

I did not and I'm not sure how to do that. Should I add it to the costume config already in [./nextflow] or make a new one and copy it somewhere else?

SPPearce commented 5 months ago

Add an additional config file and pass it to nextflow with -c

process {
    withName: "BEDTOOLS_GENOMECOV" {
      memory = 240.Gb
      time = 24.h
    }
}

(you may run out of time too, so upped that as well) From what I recall (I've moved jobs since), by default the process starts only at 20Gb, then retries two times if it fails increasing the memory each time, but only from 20 to 40 then 60.

Freeda2023 commented 5 months ago

Thank you so much! It worked.

JoseEspinosa commented 4 months ago

Closed in #369

nf-core / atacseq

Increase memory allocation for BEDTOOLS_GENOMECOV #344

Description of feature