zengxiaofei / HapHiC

HapHiC: a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data
https://www.nature.com/articles/s41477-024-01755-3
BSD 3-Clause "New" or "Revised" License
140 stars 10 forks source link

How to correctly reassignment? #24

Closed zjygoo closed 5 months ago

zjygoo commented 6 months ago

Hello, Xiaofei. I'm having some small issues with HapHiC. My species is an autotetraploid (2n=4x=44), and I used p_utg generated by hifiasm as the input file. When clustering, the command is as follows: haphic cluster genome.final.fasta HiC.filtered.bam 44 --threads 80 The result is as follows: 2024-04-29 09:48:41 [recommend_inflation] You could try inflation from 1.5 (length ratio = 0.75) Entering the recommended inflation factor 1.5 folder: -rw-rw-r-- 1 ps ps 152 4月 29 09:48 group10_32091997bp.txt -rw-rw-r-- 1 ps ps 368 4月 29 09:48 group11_31248394bp.txt -rw-rw-r-- 1 ps ps 153 4月 29 09:48 group12_29837997bp.txt -rw-rw-r-- 1 ps ps 215 4月 29 09:48 group13_29756760bp.txt -rw-rw-r-- 1 ps ps 185 4月 29 09:48 group1_41105304bp.txt -rw-rw-r-- 1 ps ps 214 4月 29 09:48 group14_29336476bp.txt -rw-rw-r-- 1 ps ps 119 4月 29 09:48 group15_27944639bp.txt -rw-rw-r-- 1 ps ps 89 4月 29 09:48 group16_26507170bp.txt -rw-rw-r-- 1 ps ps 182 4月 29 09:48 group17_26402146bp.txt -rw-rw-r-- 1 ps ps 243 4月 29 09:48 group18_26392657bp.txt -rw-rw-r-- 1 ps ps 151 4月 29 09:48 group19_26370085bp.txt -rw-rw-r-- 1 ps ps 308 4月 29 09:48 group20_26103341bp.txt -rw-rw-r-- 1 ps ps 213 4月 29 09:48 group21_25795553bp.txt -rw-rw-r-- 1 ps ps 152 4月 29 09:48 group22_23983702bp.txt -rw-rw-r-- 1 ps ps 119 4月 29 09:48 group23_23911665bp.txt -rw-rw-r-- 1 ps ps 184 4月 29 09:48 group2_39247364bp.txt -rw-rw-r-- 1 ps ps 121 4月 29 09:48 group24_23558558bp.txt -rw-rw-r-- 1 ps ps 150 4月 29 09:48 group25_23164941bp.txt -rw-rw-r-- 1 ps ps 208 4月 29 09:48 group26_22711677bp.txt -rw-rw-r-- 1 ps ps 151 4月 29 09:48 group27_22662324bp.txt -rw-rw-r-- 1 ps ps 244 4月 29 09:48 group28_22502537bp.txt -rw-rw-r-- 1 ps ps 120 4月 29 09:48 group29_21592425bp.txt -rw-rw-r-- 1 ps ps 211 4月 29 09:48 group30_19931306bp.txt -rw-rw-r-- 1 ps ps 151 4月 29 09:48 group31_18740276bp.txt -rw-rw-r-- 1 ps ps 119 4月 29 09:48 group32_18509058bp.txt -rw-rw-r-- 1 ps ps 180 4月 29 09:48 group33_17780926bp.txt -rw-rw-r-- 1 ps ps 152 4月 29 09:48 group3_38952077bp.txt -rw-rw-r-- 1 ps ps 243 4月 29 09:48 group34_17501202bp.txt -rw-rw-r-- 1 ps ps 152 4月 29 09:48 group35_17396856bp.txt -rw-rw-r-- 1 ps ps 208 4月 29 09:48 group36_17321984bp.txt -rw-rw-r-- 1 ps ps 88 4月 29 09:48 group37_16929067bp.txt -rw-rw-r-- 1 ps ps 119 4月 29 09:48 group38_16284770bp.txt -rw-rw-r-- 1 ps ps 119 4月 29 09:48 group39_14358708bp.txt -rw-rw-r-- 1 ps ps 88 4月 29 09:48 group40_13554138bp.txt -rw-rw-r-- 1 ps ps 270 4月 29 09:48 group41_12117333bp.txt -rw-rw-r-- 1 ps ps 148 4月 29 09:48 group42_9552907bp.txt -rw-rw-r-- 1 ps ps 247 4月 29 09:48 group4_37334910bp.txt -rw-rw-r-- 1 ps ps 88 4月 29 09:48 group43_9118406bp.txt -rw-rw-r-- 1 ps ps 88 4月 29 09:48 group44_8370283bp.txt -rw-rw-r-- 1 ps ps 245 4月 29 09:48 group5_36935828bp.txt -rw-rw-r-- 1 ps ps 216 4月 29 09:48 group6_34040223bp.txt -rw-rw-r-- 1 ps ps 216 4月 29 09:48 group7_33466005bp.txt -rw-rw-r-- 1 ps ps 152 4月 29 09:48 group8_33337447bp.txt -rw-rw-r-- 1 ps ps 369 4月 29 09:48 group9_32236358bp.txt I found that the contigs have been clustered into 44 groups. However, when I perform the second step of reassignment, I found that only 39 groups are redirected in the final_groups, which is not consistent with the expected 44 groups. How can I solve this problem?

zengxiaofei commented 6 months ago

Hi @zjygoo,

Sorry for the delay. It seems that some groups were clustered together during reassignment. This could happen when some contigs from different chromosomes were not correctly separated. You could try the clustering results with higher inflation values. Or you can have a look at the current result in Juicebox first and then decide on what to do.

Best regards, Xiaofei

zjygoo commented 6 months ago

Thank you for your advice, I will try it your way.

YanChunL commented 4 months ago

Hi,xiaofei.I meet the same issues,but when I try the clustering results with higher inflation values The numbers of groups become less.My genome is triploid with 36 chromosomes inflation value is 2.1 7acebd3694c88d60d81ab32943bfaa0f inflation value is 5.0 1db509d8ee6edbf7e0b29db5f849b4a6 My commands: bwa index ../00.data/asm.fa bwa mem -5SP -t 32 ../00.data/asm.fa ../00.data/mockTriploid_r1.fq.gz ../00.data/mockTriploid_r2.fq.gz | samblaster | samtools view - -@ 32 -S -h -b -F 3340 -o HiC.bam

(2) Filter the alignments with MAPQ 1 (mapping quality ≥ 1) and NM 3 (edit distance < 3)

/HapHiC/utils/filter_bam HiC.bam 1 --nm 3 --threads 32 | samtools view - -b -@ 32 -o HiC.filtered.bam

/HapHiC/haphic cluster ../../00.data/asm.fa ../HiC.filtered.bam 36 --max_inflation 10.0 --remove_allelic_links 3 /HapHiC/haphic reassign ../00.data/asm.fa full_links.pkl ./inflation_5.0/mcl_inflation_5.0.clusters.txt paired_links.clm --nclusters 36

zengxiaofei commented 4 months ago

Hi @YanChunL,

Could you please create a new issue with as much of this information as possible (refer to: https://github.com/zengxiaofei/HapHiC/issues/32)?

Best wishes, Xiaofei