vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Fix spurious deletions introduced by dozeu's seeding heuristic #4223

Closed jeizenga closed 4 months ago

jeizenga commented 4 months ago

Changelog Entry

To be copied to the draft changelog by merger:

Description

The dozeu pair rescue algorithm in vg giraffe uses a simple and error-prone heuristic to find a location to anchor the dozeu alignment on whenever there are no local minimizer hits. It's possible for the heuristic to initially miss the correct alignment but then find it in the subsequent alignment steps. In this case, it can produce essentially the correct alignment, except that it is misanchored on a nearby position, which can lead to deletions that seem to go off to nowhere. The dozeu seeding heuristic can't be easily fixed to avoid these cases, so instead I implemented a post-processing algorithm to remove the erroneous deletions.

Resolves https://github.com/vgteam/vg/issues/4204