vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

graph path name #4068

Closed tanger-code closed 1 year ago

tanger-code commented 1 year ago

I use vg paths -L -x chr21.xg to get all the paths in the graph. image But when I want to genotyping the graph using vg call graph.xg -k graph.pack -p GRCh38#chr21> graph.vcf, it goes wrong. When I use vg call graph.xg -k graph.pack -p GRCh38#0#chr21> graph.vcf, it works. Why the path name from vg paths incompatible with vg call, should I update vg(now version: conda -vg: variation graph tool, version v1.50.1 "Monopoli")

tanger-code commented 1 year ago

image And are there any naming conventions for these paths?

glennhickey commented 1 year ago

There is a naming convention:

https://github.com/vgteam/vg/wiki/Path-Metadata-Model

In vg we've had support for paths for the form GRCh38#chr1 as well, (apart from version 1.50.0).

But I cannot reproduce your issue. Are you able to share your input xg?

tanger-code commented 1 year ago

I made some mistakes in the above statement. In fact, I'm using the .gfa file to get the graph.pack file and used for vg call. That's, the genotyping command is vg call graph.gfa -k graph.pack -p GRCh38#0#chr21> graph.vcf.

This is the .vg file, and I use command vg index to get the .xg file and vg view to get the .gfa file: https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.chroms/chr21.vg

glennhickey commented 1 year ago

That makes sense. There can indeed a bit of an inconsistency between the GFA (GRCh38#0#chr21) and .vg files (GRCh38#chr21), especially after having gone through vg conversion. The simplest thing is probably to use vg call -S GRCh38 to specify the reference sample name (instead of -p) and go from there. Otherwise, if you use vg paths on the actual file you are running on vg call, you should be able to figure out which path to use.

I think I will change minigraph-cactus to always put that haplotype number in paths to avoid future confusion with vg, but that won't change the current graphs.

tanger-code commented 1 year ago

Ok, thanks! I'll try that later.