nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)
https://nf-co.re/hic
MIT License
81 stars 55 forks source link

Remove duplicates execution process error #111

Closed koushik20 closed 2 years ago

koushik20 commented 2 years ago

Hello,

I am running an nf-core/hic pipeline for breast samples (Total 6 samples including replicates) 500 Million reads each. I am getting the following error

executor >  local (4)
[1f/560b97] process > get_software_versions                                         [100%] 1 of 1, cached: 1 ✔
[e6/dd369f] process > makeChromSize (genome.fa)                                     [100%] 1 of 1, cached: 1 ✔
[3e/5eccce] process > getRestrictionFragments (genome.fa [^GATC])                   [100%] 1 of 1, cached: 1 ✔
[5c/13bee9] process > bowtie2_end_to_end (HiChIP_MCF10A-B_S8_R2_001.35)             [100%] 234 of 234, cache...
[20/529041] process > trim_reads (HiChIP_MCF10A-B_S8_R2_001.35)                     [100%] 234 of 234, cache...
[c0/19592a] process > bowtie2_on_trimmed_reads (HiChIP_MCF10A-B_S8_R2_001.27)       [100%] 234 of 234, cache...
[37/3b2c66] process > merge_mapping_steps (HiChIP_MCF10A-B_S8_001.34 = HiChIP_MC... [100%] 234 of 234, cache...
[63/d5a7a2] process > combine_mapped_files (HiChIP_MCF10A-B_S8_001.26 = HiChIP_M... [100%] 117 of 117, cache...
[5e/368d6e] process > get_valid_interaction (HiChIP_MCF10A-B_S8_001)                [100%] 117 of 117, cache...
[84/34a0f2] process > remove_duplicates (HiChIP_MCF10A-A_S7_001)                    [100%] 3 of 3, failed: 3...
[a9/485599] process > merge_sample (mRSstat)                                        [100%] 8 of 8, cached: 8 ✔
[-        ] process > build_contact_maps                                            -
[-        ] process > run_ice                                                       -
[-        ] process > generate_cool                                                 -
[-        ] process > multiqc                                                       -
[b7/f6bd1d] process > output_documentation (1)                                      [100%] 1 of 1, cached: 1 ✔

Error executing process > 'remove_duplicates (HiChIP_MCF10A-B_S8_001)'

Caused by:
  Process `remove_duplicates (HiChIP_MCF10A-B_S8_001)` terminated with an error exit status (137)

Command exit status:
  137

Command output:
  (empty)

Command error:
  .command.sh: line 5:    31 Killed                  sort -T /tmp/ -S 50% -k2,2V -k3,3n -k5,5V -k6,6n -m HiChIP_MCF10A-B_S8_001.3_bwt2pairs.validPairs HiChIP_MCF10A-B_S8_001.21_bwt2pairs.validPairs 
          32 Done                    | awk -F"\t" 'BEGIN{c1=0;c2=0;s1=0;s2=0}(c1!=$2 || c2!=$5 || s1!=$3 || s2!=$6){print;c1=$2;c2=$5;s1=$3;s2=$6}' > HiChIP_MCF10A-B_S8_001.allValidPairs

Work dir:
  /mnt/hichip_fastq/work/fe/551ceae621f99884fd3f0b8c061957

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

The following is the script I ran

sudo nextflow run nf-core/hic -r 1.0.0  \
       --reads '/mnt/hichip_fastq/MCF10A_2021/HiChIP_MCF10A-{A,B}_S{7,8}_R{1,2}_001.fastq.gz' \
       -profile docker \
       -resume \
       --splitFastq 10000000 \
       --max_memory '80.GB' \
       --max_time '10.h' \
       --max_cpus 20 \
       --outdir "/mnt/hicpro_results/MCF10A_2021" \
       --genome GRCh37 \
       --bwt2_opts_end2end '--very-sensitive --end-to-end --reorder' \
       --bwt2_opts_trimmed '--very-sensitive --end-to-end --reorder' \
       --ligation_site 'GATCGATC' \
       --restriction_site '^GATC' \
       --min_cis_dist 1000 \
       --min_mapq 20 \
       --bin_size '5000,20000,40000,150000,500000,1000000' \
       --saveReference 

I tried to run with version 1.3.0 but the pipeline couldn't able to complete the bowtie2 end-to-end process so I am running with version 1.0.0.

Any thoughts?

nservant commented 2 years ago

Hi, This is a memory issue. You should increase the resource for this job. By default, it is running with 12GB of RAM which seems to not be enough in your case.

I would suggest to create your own config file and to change the RAM for this process ;

process {
  withName:remove_duplicates {
    memory = 40.Gb
  }
}

see https://nf-co.re/usage/configuration#custom-configuration-files

nservant commented 2 years ago

btw, what was the issue with v1.3.0 ? was it a conda issue ?

koushik20 commented 2 years ago

Thank you for your suggestions the pipeline is successfully completed.

When running with version1.3.0 I am getting the following error

Execution cancelled -- Finishing pending tasks before exit
- Ignore this warning: params.schema_ignore_params = "saveReference,splitFastq" 
WARN: Found unexpected parameters:
* --saveReference: true
* --splitFastq: 10000000
Error executing process > 'bowtie2_end_to_end (HiChIP_MCF10A_R1)'

Caused by:
  Process exceeded running time limit (8h)

Command executed:

  INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'`
    bowtie2 --rg-id BMG --rg SM:HiChIP_MCF10A-A_S7_R1_001 \
  --very-sensitive --end-to-end --reorder \
  -p 4 \
  -x ${INDEX} \
  --un HiChIP_MCF10A-A_S7_R1_001_unmap.fastq \
    -U HiChIP_MCF10A-A_S7_R1_001.fastq.gz | samtools view -F 4 -bS - > HiChIP_MCF10A-A_S7_R1_001.bam

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /mnt/hichip_fastq/work/f9/f391d404fdf5768da48979b926096d

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`