Closed SPPearce closed 4 months ago
Scratch that, I've just tested and hit the following error on a different sample:
CDX18_I_REP3.mLb.clN.bedGraph is not case-sensitive sorted at line 23978872. Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C, or bedSort and try again.
So sorting is required.
Suggest that we just increase the memory allocation for this process, at least when replicates are being merged.
Hi. I always get error at this step and while I even have access to 256GB of memory on cluster, my pipeline exits with 137 error. I have huge files (starting from fastqc with each of being 15GB) and reference genome of GRCM38. I was wondering if there is an option to fix this, or if I can do anything to fix this on my end. Here is my error:
ERROR ~ Error executing process > 'NFCORE_ATACSEQ:ATACSEQ:MERGED_REPLICATE_BAM_TO_BIGWIG:BEDTOOLS_GENOMECOV (CONTROL)'
Caused by:
Process `NFCORE_ATACSEQ:ATACSEQ:MERGED_REPLICATE_BAM_TO_BIGWIG:BEDTOOLS_GENOMECOV (CONTROL)` terminated with an error exit status (137)
Command executed:
SCALE_FACTOR=$(grep '[0-9] mapped (' CONTROL.mRp.clN.sorted.bam.flagstat | awk '{print 1000000/$1}')
echo $SCALE_FACTOR > CONTROL.mRp.clN.scale_factor.txt
bedtools \
genomecov \
-ibam CONTROL.mRp.clN.sorted.bam \
-bg \
-scale $SCALE_FACTOR \
-pc \
\
> tmp.bg
bedtools sort -i tmp.bg > CONTROL.mRp.clN.bedGraph
cat <<-END_VERSIONS > versions.yml
"NFCORE_ATACSEQ:ATACSEQ:MERGED_REPLICATE_BAM_TO_BIGWIG:BEDTOOLS_GENOMECOV":
bedtools: $(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
Command exit status:
137
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
WARNING: Skipping mount /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/apptainer/1.2.4/var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
.command.sh: line 14: 8361 Killed bedtools sort -i tmp.bg > CONTROL.mRp.clN.bedGraph
Work dir:
/lustre06/project/6067517/work/27/234fbc9a842535b0adae076077714f
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
Have you tried increasing the memory for this process with an additional config file? Not just increasing the max memory overall.
I did not and I'm not sure how to do that. Should I add it to the costume config already in [./nextflow] or make a new one and copy it somewhere else?
Add an additional config file and pass it to nextflow with -c
process {
withName: "BEDTOOLS_GENOMECOV" {
memory = 240.Gb
time = 24.h
}
}
(you may run out of time too, so upped that as well) From what I recall (I've moved jobs since), by default the process starts only at 20Gb, then retries two times if it fails increasing the memory each time, but only from 20 to 40 then 60.
Thank you so much! It worked.
Closed in #369
Description of feature
The
BEDTOOLS_GENOMECOV
step can require a lot of memory when working on merged replicates. In my case (GRCh38), this takes 80-120Gb in total. The module runsbedtools genomecov
, followed by sorting. This sorting step doesn't actually change anything for me, as thebedtools genomecov
output is already sorted. I suspect this is always the case, as the input for genomecov must be sorted. We could also potentially change the tag here, but the tool is not multithreaded so additional cpus are not helpful, and process_high_memory defaults to excessively large amounts of memory.