langmead-lab / vargas

MIT License
25 stars 2 forks source link

How to use VCF files that contain both overlapping and multiallelic variants? #5

Open adamnovak opened 3 years ago

adamnovak commented 3 years ago

I'm trying to evaluate the mapping results from vg on a particular graph against Vargas's alignments. This means I need to be able to feed both of them the same graph.

My graph is built from VCF files that have overlapping variants (in this case, multiple indels at a particular point expressed as different records). From the README, if I want to use these files with Vargas, I need to run my VCFs through the preprocessor script. But my VCFs also have multiallelic SNPs in them, and so they don't meet the preprocessor script's requirements:

https://github.com/langmead-lab/vargas/blob/b1ad5d9dda5d00dd54cdc86576bead68122ac1c0/vargas_preprocess_VCF.py#L7-L8

How do I use these VCFs? Is there another tool I should use to split up the multiallelic SNPs into multiple records? Or is it possible to get the graph into Vargas via a graph format like GFA?

BenLangmead commented 3 years ago

I am going to ping @cdarby, as I'm not sure -- after a quick glance at the script -- whether that proviso at the top is accurate. It seems like the script might handle these fine. I'll will get back to you

BenLangmead commented 3 years ago

Charlotte has a potential fix above. It is designed to work for multiallelic SNVs (not other types of variants), but sounds like that might be sufficient for your case? Let us know how it goes