Open alinehugo opened 2 years ago
Analyse VCF file, first retrieve which kind of variant is present in each position from INFO field as ''usual'' VCF as in SVTYPE in this exemple I have the same quaetion. How to get the SVTYPE from the output vcf file of vg?
From the VCF spec:
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
Value should be one of DEL, INS, DUP, INV, CNV, BND. This key can be derived from the REF/ALT fields but is
useful for filtering
The two issues for vg are:
Number=1
means there can only be one SVTYPE
per site. But vg graph can often contain many different tpyes of SVs at the same siteBut that said, I think you raise a fair point: we should at the very least provide scripts or suggestion of best practices for cleaning of the VCFs and categorizing the SV calls, as we end up doing this ourselves too when analyzing them.
Has there been any progress regarding best practices when it comes to populating the SVTYPE
and possibly the SVLEN
field in a VCF generated from a pangenome graph? It would be useful to be able to make comparisons to VCFs produced by sniffles, etc. Thanks!
Hi, @glennhickey. It is hard to understand the 'INFO' field from the vg call
ouput. Since users care more for the variant information like SV position, SV type, and SV id as the input vcf file for autoindex-giraffe-pack-call
workflow. So it is helpful to output the raw SV information for vg call
. I sincerely hope vg team optimize for this problem.
For anyone coming across this issue with the same problem, I have found the 'truvari' tool useful for populating the INFO field of pangenome-derived VCFs. Running 'truvari anno' (https://github.com/acenglish/truvari/wiki/anno) allows you to include the SVLEN and SVTYPE tag. However it can only accurately label straightforward insertions and deletions, everything else it tagged as 'UNK', so this isn't a perfect solution. It would be great to be able to compare the output with a VCF derived from a tool such as sniffles.
Hi, @evcurran. I solved the problem by the similar way. But the key problem is to compare the vcf generated by vg call
and the original input vcf, I find it's hard to compre this two vcf file since the variants coordinate are different.
Hello everyone!
Has anyone here found a good solution to the issue?
1. What were you trying to do? Understand the output VCF of
vg call
2. What did you want to happen? Analyse VCF file, first retrieve which kind of variant is present in each position from INFO field as ''usual'' VCF as in SVTYPE in this exemple
3. What actually happened? There's no such an info in the output VCF exemple of output :
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here: NONE5. What data and command can the vg dev team use to make the problem happen?
i used usual vg commands pipeline
construct > giraffe > augment > snarls-pack > call
6. What does running
vg version
say?