vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

Autoindex failing #3399

Closed cademirch closed 3 years ago

cademirch commented 3 years ago

1. What were you trying to do? Use autoindex to produce indexes for mapping.

2. What did you want to happen? Produce indexes for mapping.

3. What actually happened? Many warnings about unsupported allele "*", then error regarding no phasings - though the VCF does contain phasing info. Stderr from autoindex:

warning:[vg::Constructor] Unsupported allele "*" found in variant, skipping variant:
NT_033779.5 23513402    .   C   *,G 192.68  .   .
warning:[vg::Constructor] Unsupported allele "*" found in variant, skipping variant:
NT_033779.5 23513542    .   G   *,C 960.27  .   .
[IndexRegistry]: Constructing GBWT from VG graph and phased VCF input.
error: [HaplotypeIndexer::parse_vcf] variant file './tmp/vg-AXjbU2/dir-SpoY02/a461b67fd342ea89b110b7b38b447e6c26d5ccef.7.phased.chunked.vcf.gz' does not contain phasings
error[VPKG::load_one]: Correct input type not found while loading handlegraph::PathHandleGraph

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

No stacktrace.

5. What data and command can the vg dev team use to make the problem happen? The VCF is rather large, I can supply a small version. vg autoindex -w map -r ref.fa -v vars.vcf.gz -T ./tmp -M 120000

6. What does running vg version say?

vg version v1.34.0 "Arguello"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by anovak@octagon
jeizenga commented 3 years ago

Is it possible to share the data you were using when you ran into this bug?

cademirch commented 3 years ago

Is it possible to share the data you were using when you ran into this bug?

I made a small version of my data to share, but could not recreate the error with this data.

I am running the command on my full dataset again to see if I can recreate the error and will report back. Regardless, I can share the full VCF (33gigs) and reference if you'd like.

Also, for the unsupported allele warning, is there any way to include those variants in construction?

cademirch commented 3 years ago

@jeizenga Autoindex failed again with the same error on my full dataset. I can share it with you, could you give me your email? I'll email you a link to where the data is hosted.

jeizenga commented 3 years ago

Sure. You can reach me at joeizeng@gmail.com.

jeizenga commented 3 years ago

@cademirch This should be fixed now in the main branch. Let me know if you have any more issues with it.