nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)
MIT License
80 stars 55 forks source link

Bowtie2 Mapping Alignment exceeded running time limit error #158

Closed koushik20 closed 1 year ago

koushik20 commented 1 year ago

Description of the bug


Thanks for the detailed documentation! I am running nfcore/hic version 2.0.0 with GRCh38 reference genome but always getting Process exceeded running time limit (16h)

Below is the terminal output

executor >  local (2)
[e4/33766a] process > NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (input_file.csv)   [100%] 1 of 1, cached: 1 ✔
[9f/54196b] process > NFCORE_HIC:HIC:PREPARE_GENOME:CUSTOM_GETCHROMSIZES (genome.fa)  [100%] 1 of 1, cached: 1 ✔
[d3/12afd7] process > NFCORE_HIC:HIC:PREPARE_GENOME:GET_RESTRICTION_FRAGMENTS (^GATC) [100%] 1 of 1, cached: 1 ✔
[c6/5b0dc7] process > NFCORE_HIC:HIC:FASTQC (BT549_Rep2)                              [100%] 2 of 2, cached: 2 ✔
[0d/50c45a] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN (BT549_Rep2) [ 25%] 1 of 4, failed: 1
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:TRIM_READS                 -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:MERGE_BOWTIE2              -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:COMBINE_MATES              -
[-        ] process > NFCORE_HIC:HIC:HICPRO:GET_VALID_INTERACTION                     -
[-        ] process > NFCORE_HIC:HIC:HICPRO:MERGE_VALID_INTERACTION                   -
[-        ] process > NFCORE_HIC:HIC:HICPRO:MERGE_STATS                               -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO2PAIRS                              -
[d5/1dd856] process > NFCORE_HIC:HIC:COOLER:COOLER_MAKEBINS (null})                   [100%] 7 of 7, cached: 7 ✔
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_CLOAD                              -
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_BALANCE                            -
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_ZOOMIFY                            -
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_DUMP                               -
[-        ] process > NFCORE_HIC:HIC:COOLER:SPLIT_COOLER_DUMP                         -
[-        ] process > NFCORE_HIC:HIC:HIC_PLOT_DIST_VS_COUNTS                          -
[-        ] process > NFCORE_HIC:HIC:COMPARTMENTS:COOLTOOLS_EIGSCIS                   -
[-        ] process > NFCORE_HIC:HIC:TADS:COOLTOOLS_INSULATION                        -
[-        ] process > NFCORE_HIC:HIC:CUSTOM_DUMPSOFTWAREVERSIONS                      -
[-        ] process > NFCORE_HIC:HIC:MULTIQC                                          -
Execution cancelled -- Finishing pending tasks before exit

Caused by:
  Process exceeded running time limit (16h)

Command executed:

  INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\.rev.1.bt2$//"`
  [ -z "$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\.rev.1.bt2l$//"`
  [ -z "$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1

  bowtie2 \
      -x $INDEX \
      -U HiChIP_BT549-B_S6_R2_001.fastq.gz \
      --threads 12 \
      --un-gz BT549_Rep2_0_R2.unmapped.fastq.gz \
      --very-sensitive --end-to-end --reorder \
      2> BT549_Rep2_0_R2.bowtie2.log \
      | samtools view -F 4 --threads 12 -o BT549_Rep2_0_R2.bam -

  if [ -f BT549_Rep2_0_R2.unmapped.fastq.1.gz ]; then
      mv BT549_Rep2_0_R2.unmapped.fastq.1.gz BT549_Rep2_0_R2.unmapped_1.fastq.gz

  if [ -f BT549_Rep2_0_R2.unmapped.fastq.2.gz ]; then
      mv BT549_Rep2_0_R2.unmapped.fastq.2.gz BT549_Rep2_0_R2.unmapped_2.fastq.gz

  cat <<-END_VERSIONS > versions.yml
      bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*$//')
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
      pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )

Command exit status:

Command output:

Work dir:

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

The pipeline always stops at this particular bowtie2 mapping step. I gave a separate nextflow.config file and assigned greater memory to this specific step.

process {
    memory = 80.GB

So My Questions are Why does the pipeline aborts at 16h timestamp even though I gave 240h max time? When I ran some samples earlier with GRCh37 the pipeline was completed successfully so I there an issue with using GRCh38? I tried to run with different --max_cpus, --max_memory, --max_time configurations but the pipeline always aborts at this particular step (command executed step) see above

Thank you!

Command used and terminal output

Input script filename:

sudo nextflow run nf-core/hic -r 2.0.0 \
       --input '/mnt/hichip_results/BT549/input_file.csv' \
       -profile docker \
       -resume \
       --fastq_chunks_size 20000000 \
       --max_memory '128.GB' \
       --max_time '240.h' \
       --max_cpus 60 \
       --outdir "/mnt/hicpro_results/BT549_Apr2023" \
       --genome GRCh38 \
       --save_pairs_intermediates \
       --bwt2_opts_end2end '--very-sensitive --end-to-end --reorder' \
       --bwt2_opts_trimmed '--very-sensitive --end-to-end --reorder' \
       --digestion 'dpnii' \
       --ligation_site 'GATCGATC' \
       --restriction_site '^GATC' \
       --min_cis_dist 1000 \
       --min_mapq 20 \
       --bin_size '5000,20000,40000,150000,500000,1000000' \

Input command: sudo bash

Relevant files


System information

Nextflow version - 22.10.7 Hardware - Desktop Executor - local Container engine: Docker OS Ubuntu - 20.04.5 Linux Version - nf-core/hic 2.0.0

ninashenker commented 1 year ago

@koushik20 I'm having this same issue - were you able to fix it?

koushik20 commented 1 year ago

I gave a separate custom nextflow config file and the pipeline was completed without any errors.

process {
  withLabel:process_high {
    memory = 64.GB
    cpus = 52
    time = 36.h

process {
  withLabel:process_medium {
    memory = 64.GB
    cpus = 52
    time = 36.h

process {
  withLabel:process_low {
    memory = 64.GB
    cpus = 52
    time = 36.h

process {
    memory = 64.GB
    cpus = 52
    time = 36.h

process {
    memory = 64.GB
    cpus = 52
    time = 36.h

memory = { check_max( 64.GB * task.attempt, 'memory' ) }

// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
  if (type == 'memory') {
    try {
      if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
        return params.max_memory as nextflow.util.MemoryUnit
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
      return obj
  } else if (type == 'time') {
    try {
      if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
        return params.max_time as nextflow.util.Duration
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
      return obj
  } else if (type == 'cpus') {
    try {
      return Math.min( obj, params.max_cpus as int )
    } catch (all) {
      println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
      return obj
ninashenker commented 1 year ago

Thank you so much! This worked nicely, though for some samples the bowtie alignment step is taking over 48 hours... seems too long.