nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)
https://nf-co.re/hic
MIT License
80 stars 55 forks source link

Difference in the detection of validpair interactions between various version of hic #178

Closed koushik20 closed 5 months ago

koushik20 commented 9 months ago

Description of the bug

I have a breast HiChIP data I ran with nf-core/hic version 1.0.0 last year and got valid pair interactions ranging from 177M - 190M from a read depth between 300M - 350M. The same data was used when I ran recently with the nf-core/hic version 2.0.0, resulting in only 1.5M - 4M valid pair interactions which is a huge difference between the results. Can you please help me guide to solve this discrepancy in the results between the versions? The differences between both runs were nf-core/hic 1.0.0 ran with Hg19 and a recent run of nf-core/hic 2.0.0 was with Hg38 remaining all the parameters were the same.

Command used and terminal output

sudo nextflow run nf-core/hic -r 2.0.0 -c /mnt/ovarian_hichip/Kura/custom_nextflow.conf \
       --input '/mnt/ovarian_hichip/Kura/input_file.csv' \
       -profile docker \
       -resume \
       --fastq_chunks_size 20000000 \
       --max_memory '64.GB' \
       --max_time '36.h' \
       --max_cpus 52 \
       --outdir "/mnt/hicpro_results/Kura_Sep2023" \
       --genome GRCh38 \
       --save_pairs_intermediates \
       --bwt2_opts_end2end '--very-sensitive --end-to-end --reorder' \
       --bwt2_opts_trimmed '--very-sensitive --end-to-end --reorder' \
       --digestion 'dpnii' \
       --ligation_site 'GATCGATC' \
       --restriction_site '^GATC' \
       --min_cis_dist 1000 \
       --min_mapq 20 \
       --bin_size '5000,20000,40000,150000,500000,1000000' \
       --save_reference

Relevant files

.nextflow.log

System information

Nextflow version - 22.10.7 Hardware - Desktop Executor - local Container engine: Docker OS Ubuntu - 20.04.5 Linux Version - nf-core/hic 2.0.0

koushik20 commented 9 months ago

HiC-Pro contact statistics generated by version 1.0.0 hicpro_contact_Kura-1 HiC-Pro contact statistics generated by version 2.0.0 hicpro_contact_Kura-2

nservant commented 5 months ago

I think this issue has been solbed. To keep track of the solution, the v.1.0.0 did not remove duplicates by default. This has been changed in the v2.0.0 where duplicates are now removed by default.