Closed Redmar-van-den-Berg closed 2 years ago
wtdbg2 aimed to assemble long noisy reads, so there is no phasing module. K-bin is better at tolerating sequencing errors and processing long reads very fast, but leads to collapse haplotypes. In your case, first polish contigs following README.md, then use other phasing tools(e.g. longshot).
Jue
Thank you for confirming that K-bins can lead to collapsed haplotypes, I'll try other phasing tools as you suggested.
First of all, thanks for creating wtdbg2, it runs very fast, and I like how detailed the algorithm is described in the publication.
I have a targetted HiFi dataset of pharmacogenetic genes, and I want to reconstruct phased haplotypes from this data (for each gene). The reason for this is that different alleles have different activities in metabolising drugs, which means that the collapsed consensus does not provide enough information.
I've run
wtdbg2 -x ccs
andwtpoa-cns
on this dataset, and the contigs match the phasing of the HiFi reads (from whatshap) very well. However, as expected, the different haplotypes are collapsed in the finaldbg.raw.fa
.If I understand correctly, there are two causes for this:
wtpoa-cns
explicitly collapses the two haplotypes to generate the consensus.What would be the best way to get phased haplotypes from wtdbg2? The information that I need should be in
dbg.ctg.lay.gz
, but I'm not sure how to process that file to non-consensus contigs. Any pointers would be appreciated.