I have been trying to genotype structural variants from a graph made with minigraph-cactus, by mapping short reads with vg giraffe, then using vg pack and vg call to get a vcf. This runs without error, but the output is sometimes hard to interpret for longer variants. This is because of small nested variants within larger structural variants getting their own allele in the vcf, leading to variants with ~10 alleles (depending on the number of input genomes), most of which are more than 95% similar to each other. Ideally, to prevent this, I would like to remove small nested bubbles from a graph before calling only the large ones. Vg simplify sounds like it does what I want, but it gives me a segmentation fault. Do you know of a strategy to deal with this?
Yeah this is a known issue that we're actively working on. I think your best bet until we get it sorted out is to merge the SVs together in the VCF output using something like truvari.
Hello,
I have been trying to genotype structural variants from a graph made with minigraph-cactus, by mapping short reads with vg giraffe, then using vg pack and vg call to get a vcf. This runs without error, but the output is sometimes hard to interpret for longer variants. This is because of small nested variants within larger structural variants getting their own allele in the vcf, leading to variants with ~10 alleles (depending on the number of input genomes), most of which are more than 95% similar to each other. Ideally, to prevent this, I would like to remove small nested bubbles from a graph before calling only the large ones. Vg simplify sounds like it does what I want, but it gives me a segmentation fault. Do you know of a strategy to deal with this?
Thanks for any help with this,
Henri