vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

haplotype scoring is unstable on complex graphs #1590

Open ekg opened 6 years ago

ekg commented 6 years ago

I'd like to use GBWT to generate my pruned graphs. They are complex, non-linear graphs generated by vg msga. It seems like I can't convince the mapper to not use the GBWT to try to do its haplotype scoring thing, and this is causing vg map to crash with:

vg: src/haplotypes.cpp:870: haplo::haploMath::RRMemo::RRMemo(double, std::size_t): Assertion `exp_rho < 1' failed.

I'm running vg map -d to give it the base name, but it's picking up the GBWT even if I name it .gbwt.1, which seems strange. Anyway the temporary solution in this case seems to be to specifically give only the .xg and .gcsa indexes.

To reproduce the error, download http://hypervolu.me/~erik/vg/haploscore-fail.tar.xz, unpack with tar xf and run:

./run.sh cerevisiae.pangenome.vg S288c.genome.fa.fai S288c SK1

Until we've got it working on all the kinds of graphs that vg can map against we should probably enable the haplotype scoring only optionally. Otherwise everyone who makes GBWT in the process of indexing is going to run into these kinds of problems.

Also, I'm concerned that on many graphs the assumptions in the Li and Stephens model may break. For instance does it handle the concept of copy number variation gracefully?

adamnovak commented 6 years ago

It really should not be picking up the GBWT by magic regardless of its name. Are you sure it's not trying to read the haplotypes out of the xg file?

I think you can set --hap-exp 0 and it will actually skip the whole haplotype-aware scoring step, even if the file is specified.