vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Genotyping SVs in a minigraph-cactus graph yields many similar alleles in output vcf #4281

Open henrivkgt opened 1 month ago

henrivkgt commented 1 month ago

Hello,

I have been trying to genotype structural variants from a graph made with minigraph-cactus, by mapping short reads with vg giraffe, then using vg pack and vg call to get a vcf. This runs without error, but the output is sometimes hard to interpret for longer variants. This is because of small nested variants within larger structural variants getting their own allele in the vcf, leading to variants with ~10 alleles (depending on the number of input genomes), most of which are more than 95% similar to each other. Ideally, to prevent this, I would like to remove small nested bubbles from a graph before calling only the large ones. Vg simplify sounds like it does what I want, but it gives me a segmentation fault. Do you know of a strategy to deal with this?

Thanks for any help with this,

Henri

glennhickey commented 1 month ago

Yeah this is a known issue that we're actively working on. I think your best bet until we get it sorted out is to merge the SVs together in the VCF output using something like truvari.