Open GavinHuttley opened 1 year ago
This is a proof of principle implementation. It does improve alignment quality over a simpler align-to-reference approach in the small sample I ran it against, but I think that's due to some design decisions I made to simplif this first implementation (Knuth's dictum and all that).
Works off the Cogent3/tests/data/brca1.fasta
and Cogent3/tests/data/murphy.tree
files.
The limitation of
dbga
is that it requires a k that approaches the threshold of all k-mers within a sequence are unique, opposed by the requirement that the number of shared k-mers between sequences is maxmised.The more sequences being aligned, the more this will (ultimately) increase the number of bubbles. My proposed "hack" is to: