vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

how to set the backbone reference genome when doing SV calling and genotyping #3625

Open biozzq opened 2 years ago

biozzq commented 2 years ago

Dear all,

When learning how to detect SVs based on the pggb constructed graph genome using vg giraffe, I found that the results are a bit strange. And this may be due to the absense of the backbone reference genome when running vg. I wonder that if we can detect SVs on the specified paths. Also, if we could detect SVs on the specified paths, such as the backbone full genome, I think the SV type could be also annotated in the final VCF file. Thank you in advance.

vg autoindex -R XG -g pggb.prune.gfa -w giraffe -t 4 -T ./ -p pggb.prune``
vg giraffe -Z pggb.prune.giraffe.gbz -m pggb.prune.min -d prune.dist -t 4 -f R1.fq.gz -f R2.fq.gz > map.gam
vg pack -x pggb.prune.xg -g map.gam -Q 10 -s 5 -o map.pack -t 4
vg call pggb.prune.xg -k map.pack -s demo -t 4 > demo.graph.vcf

Sincerely, Zheng zhuqing

glennhickey commented 2 years ago

vg call can use any paths in the graph as references via the -p option. But cycles in the reference path (which PGGB can produce) will be collapsed in the VCF as well, which makes them hard to interpret in vg call's output (ie wrong).

For example, if there are two variants on path chr1

chr1 10 A T
chr1 20 A G

but positions 10 and 20 are on the same node in the graph (ie a cycle), then vg call will just collapse them into the first position found

chr1 10 A T,G

I guess in theory if you're calling with a GBWT call -g it would be possible to port over the unfolding logic from deconstruct to resolve cycles -- but we haven't done that yet.

One thing you might have more luck with now is odgi untangle