vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg index -g should warn that a graph is too complex #2438

Open ekg opened 5 years ago

ekg commented 5 years ago

When building the GCSA2 index, new users are caught 100% of the time by the space explosion of kmer enumeration. vg prune is basically a requirement. At very least, we should have a pass that checks the graph complexity and estimates good pruning parameters. Second best, and simpler for users, we should try to do the pruning automatically so that the indexing doesn't blow up.

edawson commented 5 years ago

Not just new users - I think I've run into this with some of the structural variant graphs I'm building.

My vote would be for the default pruning level. API wise, it'd be nice to be able to turn it off with a single flag or modify it with a single delimited arg (like we do for the insert size distribution in map).

ekg commented 5 years ago

The default pruning levels will introduce a new kind of problem: over-pruning. And it will be hard for users and people evaluating the model to understand what's going on. This is hard to get right.