vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

`vg add` doesn't make alt paths for variants #2677

Open adamnovak opened 4 years ago

adamnovak commented 4 years ago

@jmonlong says we use vg add for some structural variant graph stuff.

Looking at the code, vg add does not appear to create paths for the alleles of the variants it adds.

This means we can't find them for GBWT indexing, but I don't think we ever want to make GBWT indexes for variants we pull in with add anyway, since they aren't phased with variants from the original VCF. But it also means that genotyping modes that use the alt paths won't work.

@glennhickey we do use the alt paths for some kinds of genotyping, right? Is it worth adding in support for them in vg add?

glennhickey commented 4 years ago

Yes, alt paths are needed to genotype VCFs with vg call. For any graph derived from VCF, it's a nice feature to be able to express genotypes exactly in terms of the input VCF(s).

That said, I'm not sure how important it is to @jmonlong right now. Without alt paths, you can get more-or-less equivalent genotypes, it's just they'll be shifted around and merged and whatnot when compared to the VCFs used to create the graph.

nuno-agostinho commented 6 months ago

Hi there!

I would like to know if there is a way to incorporate VCF alternate paths into an existing pangenomes GFA, VG or XG file. Given that vg add doesn't support this option, I was thinking of:

  1. Using vg construct -A with FASTA and VCF files
  2. Combine that output with the pangenomes GFA via vg combine

Is there a more efficient way to do this? Also, is there a reason why vg add is now deprecated?

Thanks for your time!

Best regards, Nuno

adamnovak commented 6 months ago

@nuno-agostinho I don't think that will work; vg combine just puts both graphs floating next to each other as if they were separate sets of chromosomes; it doesn't weld the graphs together along shared paths. And in fact it might fail if the graphs have paths with the same names in them.

@glennhickey might have something in the Cactus universe that could do the required welding operation. It's well within what the pinchesAndCacti library could be used to efficiently compute, but I don't know if any command-line tools exist to do it.

We marked vg add as deprecated not because we have a better way to do what it does, but because it doesn't do what it does very well and because we don't need to do it enough to justify filling in the gaps like the missing alt paths.

nuno-agostinho commented 6 months ago

Hi @adamnovak, thanks for the heads up about vg combine.

I am looking into this to explore how to efficiently support sets of genetic variants from VCF files with pangenome graphs in Ensembl Variant Effect Predictor (VEP) and the Ensembl website.

I like how vg construct -A integrates variants from a VCF, including the alternative allele as alt paths. Something like the vg add command with support for adding alt paths for VCFs seemed to have potential for our use case, but I can continue looking for alternatives.

Thanks for your input!