vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.11k stars 194 forks source link

precollapsing order length hits can introduce false positive MEMs #1566

Open ekg opened 6 years ago

ekg commented 6 years ago

I've found a few cases where the order length merging is introducing false MEM hits. It is possible to detect them by turning the collapsing on then instrumenting code in cluster.cpp which walks the MEM to report when the MEM does not match the graph.

This isn't a certainty, but I don't know any other source it might have besides the collapsing. It is now causing problems for the cluster subgraph extraction as this breaks when the MEM does not match the graph.

I'm doing tests which should check if the collapsing is causing issues for mapping. If these suggest this is not the problem then I will have to dig deeper.

jeizenga commented 6 years ago

Okay, let me know what you find. If you get me a reproducing example I can take a look at it.

ekg commented 6 years ago

It doesn't seem to be causing the problem with performance for me. However I can find you an example.

On Sat, Mar 24, 2018, 03:12 Jordan Eizenga notifications@github.com wrote:

Okay, let me know what you find. If you get me a reproducing example I can take a look at it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/1566#issuecomment-375755279, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EZlk-QorkJYOq133UxaJdJPmZJ7mks5thTsjgaJpZM4S4Mwp .

jeizenga commented 6 years ago

Have an example handy yet?

jeizenga commented 6 years ago

I'm still interested in this, but the fact that we've moved to using 256-mer indexes does circumvent this problem for the time being. I'm going to guess that the reads would be hard to track down at this point? Maybe we should close this issue?