zengxiaofei / HapHiC

HapHiC: a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data
BSD 3-Clause "New" or "Revised" License
127 stars 9 forks source link

Haplotype inference on polyploids #68

Open Rhia15 opened 3 hours ago

Rhia15 commented 3 hours ago

Hi, I am currently manually curating a a tetraploid genome in Pretextview which I assembled with HapHic, I am just wondering how you would infer the haplotypes from the HiC contact map? I understand there's a picture from refsort of a tetraploid:

image

I assumes that within each larger square, scaf1, scaf2, scaf3, scaf4 would equal to Hap1, Hap2, Hap3, Hap4 and so on? Or is this not the case?

For example, this is a snippet of my current assembly on PretextView, and I have just assumed that these are the Haplotypes:

image

Any advice on haplotype inference from the HiC contact map would be great!

Thank you very much!

zengxiaofei commented 2 hours ago

Typically, diagonally distributed Hi-C signals can be found between homologous chromosomes. These Hi-C links result from base-level switch errors during genome assembly and Hi-C read mapping errors. Although they do not represent true chromatin interactions, this information can still be used to infer haplotypes. In your case, these diagonally distributed Hi-C links are also present. However, please note that this pattern may not be observed when the heterozygosity level is too high. To address this issue, haplotypes can be inferred through self-alignment or by aligning the assembly to a reference genome. In your case, I’m not sure how you performed Hi-C read mapping and filtering. In addition, I’m not sure how these Hi-C links are normalized and visualized. However, it is clear that there are many errors in the assembly.

Rhia15 commented 2 hours ago

Hi thank you so much for the quick response, I was wondering if you would please be able to point out the errors, I can't seem to find many resources for polyploid Hi-C maps and I'd love to learn more. Are the errors present in the way I have arranged the Hi-C map or do they stem from pre processing? How did you spot these errors?

I mapped them using samtools and then started rearranging in pretextview, do you recommend using HapHic assembies with Juicebox instead of other tools such as PretextView? image

Thank you so much!!