Closed cyr20040123 closed 3 years ago
1, you are right. 2.1, kmer-size is much smaaler than bin-size. 2.2, mostly one bin will match two adjacent subject bins, excepting the bin bounds are so luckly to be well-aligned. 2.3, that is why FBG is spare graph, when spare, there are no artificial loops on graph which caused by the problem of not-well-aligned bins. 3, wtdbg2 stores the sequences in bases, and generates a bin data structure referred to the bases.
Dr. Ruan, thank you so much for your reply! It does help me to understand.
Q3 supplement: How to restore the base sequence from the binned nodes in the FBG? Since aligned bins in an FBG node are not exactly the same referring to your answer 2.2 because they come from different reads. So when transferring binned data structure to based sequences, how to decide which corresponding based sequences to refer to? (Equivalent k-bins may come from different reads, which one will be regarded as the representative?)
Actually, FBG is a eulerian path graph. When obtaining the sequences, we are saying edge. Anyway, the not-well-aligned problem still exist in nodes and edges, but no warry at all. wtdbg2 construct PO-MSA for each edge and then join two linear edges' consensus seq by pairwise seq alignment. So, the seq of final unitigs are composed of edge consensus seqs in overlaping way.
Got it! Thank you so much, Dr. Ruan!
Hi Dr. Ruan,
Thank you for your highly efficient assembly tool. I have read your paper and am still curious about how does Wtdbg2 make the read sequence into a binned one?
For example, assume 1bin is 8bps rather than 256bps, two 100%-correct reads are shown below.
If we binned them directly, like:
We may have an issue that they may be hardly pair-wise aligned by indicating that they share kmers.
I would like to know:
Thank you!