vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

`vg call` strips path/contig info from vcf #4445

Open CormacKinsella opened 1 week ago

CormacKinsella commented 1 week ago

1. What were you trying to do?

vg call graph.gbz --pack graph.pack --snarls graph.snarls --genotype-snarls --all-snarls --gbz-translation --gbz  > example.vcf

2. What did you want to happen?

#CHROM
simChimp#0#simChimp.chr6

3. What actually happened?

#CHROM 
simChimp.chr6

5. What data and command can the vg dev team use to make the problem happen?

I did this using the simChimp example from Minigraph-Cactus, but I assume any gbz with PanSN contig naming.

6. What does running vg version say?

v1.61.0 "Plodio"
CormacKinsella commented 1 week ago

I think this may be the same issue as #4442. I assumed I could run vg call without specifying a reference sample with -S (for a graph with only one ref sample), as according to the -p readme it should default to all reference paths

-p, --ref-path NAME Reference path to call on (multipile allowed. defaults to all paths)

Cheers for any advice!

glennhickey commented 3 days ago

Yeah, it looks like vg call will only add the PANSN prefix if it thinks there can be ambiguity between different samples in the VCF. Probably a good idea to add an option (like deconstrut) to let the user force the issue, but in the meantime you'll have to use sed or something like that to add it yourself...