xingjianleng / DBGA

The repository for the genome sequence alignment research project
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

implement a progressive alignment integration algorithm #23

Open GavinHuttley opened 1 year ago

GavinHuttley commented 1 year ago

The limitation of dbga is that it requires a k that approaches the threshold of all k-mers within a sequence are unique, opposed by the requirement that the number of shared k-mers between sequences is maxmised.

The more sequences being aligned, the more this will (ultimately) increase the number of bubbles. My proposed "hack" is to:

GavinHuttley commented 1 year ago

This is a proof of principle implementation. It does improve alignment quality over a simpler align-to-reference approach in the small sample I ran it against, but I think that's due to some design decisions I made to simplif this first implementation (Knuth's dictum and all that).

Works off the Cogent3/tests/data/brca1.fasta and Cogent3/tests/data/murphy.tree files.

working.py.zip