vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

Account for vg producing multiple types of HandleGraphs #786

Closed adamnovak closed 4 years ago

adamnovak commented 4 years ago

This updates toil-vg to work with current master vg and its multiple handle graph output types. It makes sure to use vg combine where needed.

glennhickey commented 4 years ago

The genotyping output seems to have changed for the worse on one test. The test itself isn't super meaningful, but I'm not sure why it would have changed. Will first confirm if it's the de-protofication pr that's at blame. If it's just a small random perturbation in the mapping output then it's no big deal.

glennhickey commented 4 years ago

The tests run with the current toil-vg master and this vg. Moving to the next vg commit and updating toil-vg to this branch brings the error. Looks like a graph merging issue, as one chromosome is completely absent from the output VCF here, leading to the score drop.

adamnovak commented 4 years ago

Hm, That's not good at all. I'll have to see if I can replicate this locally, and if vg combine is somehow dropping files or if something else is going on.

Unfortunately I'm supposed to be doing Toil things tomorrow, so I'm not sure how much time I will have for this until Thursday.

On 1/14/20, Glenn Hickey notifications@github.com wrote:

The tests run with the current toil-vg master and this vg. Moving to the next vg commit and updating toil-vg to this branch brings the error. Looks like a graph merging issue, as one chromosome is completely absent from the output VCF here, leading to the score drop.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/vgteam/toil-vg/pull/786#issuecomment-574396419

glennhickey commented 4 years ago

I'm taking a look now. It's pretty easy to reproduce. After downloading the input files

toil-vg construct ./js ./os --vcf ./HGSVC_regions.vcf.gz --fasta hg38_chr21_22.fa --regions chr21 chr22 --out_name test --alt_paths --flat_alts --realTimeLogging --realTimeStderr --pangenome --container None --xg_index

gives

os/0.vg
os/1.vg
os/test.xg

Looking for a chr21 variant in 0.vg looks reasonable

vg find -x os/0.vg -p chr21:5322212 -c 2 | vg view - | grep S
S   166476  ATTCCAGATTCTACAAAAAGAGTGTTGCAAAA
S   166480  GTGCTCTTTTTATTTAAAGATATTTCCTTTTC
S   166475  CACCATAGGCCTCAAAGCCCTCCAAATATCCA
S   166472  CATATATAACAAAGAAGTTTCTCAGAATGTTT
S   166474  GTTCAGTTTTTATGTGAAGATATTTCTTTTTC
S   166473  CT

But in the xg it's all Ns

vg find -x os/test.xg -p chr21:5322212 -c 2 | vg view -
S   166464  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
S   166463  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
S   166461  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
S   166460  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
S   166462  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

The first issue seems to be that vg ids -j is not working. os/0.vg and os/1.vg have the same node ids. combine should be throwing an error for this.

In looking at this it also seems that vg stats -r isn't working either

vg stats -r os/0.vg 
node-id-range   1:0
glennhickey commented 4 years ago

These are pretty fundamental problems. I'm wondering if I (and toil-vg gitlab) are somehow using a vg with out-of-sync dependencies for libbdsg and libhandlegraph.

adamnovak commented 4 years ago

Combine doesn't have any memory, and if fed a Protobuf vg file it doesn't even parse it, just copies it to standard out.

On 1/15/20, Glenn Hickey notifications@github.com wrote:

Closed #786.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/vgteam/toil-vg/pull/786#event-2951430832

glennhickey commented 4 years ago

Hmm, never meant to close this.