tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
170 stars 39 forks source link

Majority of contigs in the first parition #23

Closed rpwang closed 4 years ago

rpwang commented 4 years ago

Hi @tangerzhang ,

I am working on a highly heterozygous diploid genome. We used long reads in the assembly. The assembly reference has been corrected for miss assemblies. We mapped the HiC reads and then the ALLHiC results showed that more than 40% of all contigs were placed in the first partition. Can I somewhere in the log or intermediate files observe why that is happening? Do you have any suggestions how to fix this?

tangerzhang commented 4 years ago

Hi @rpwang , This problem is quite difficult to fix, especially when assembling a highly heterozygous genome. The reason is that the contig level assembly contains a large proportion of chimeric contigs and collapsed regions (See our paper in Nature Plants; Sup Figure 30). One possible solution is to perform a synteny analysis between the the first partition and a reference genome. The chimeric scaffolds should be observed and you can manually correct the large group. If you have parental DNA sequences, the best way is to phase pacbio reads using CANU trio-binning and assemble the two genomes separately. It is still a big challenge to assemble the heterozygous diploid genome. We are still developing new phasing methods which incorporate mapping based strategy and assembly based approach to phase diploid genome.

rpwang commented 4 years ago

Hi @tangerzhang ,

Thank you for your reply! I shall take your suggestion into my next approach.