vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Better traversal size caps in vg call #4252

Closed glennhickey closed 3 months ago

glennhickey commented 3 months ago

Changelog Entry

To be copied to the draft changelog by merger:

Description

It's come up a number of times where people try to run vg call on complex graphs and runs forever. The reason being is that it gets lost trying to find traversals through enormous snarls, and there is not enough signal in the read mappings to narrow the search down to something manageable. The min/max traversal cutoffs -c/-C were added to address this, but since they only filtered on reference allele length, they only helped sometimes -- it just takes a giant insertion to get around this. This PR changes these options to take into account alt alleles as well. So if you run -c 50 -c 1000, it will only try to genotype sites where at least one traversal is >=50bp, and it will give up on any site as soon as a single traversal >1000bp is found.