Open jessmewald opened 7 months ago
Hi @jessmewald
I looked into this. However, based on the errors, there is probably little I can do, because I think the VCF does not follow the VCF 4.2 specification.
There seem to be some issues with the VCF - some SVLEN
fields seem to be wrong. For instance, based on the output line
Invalid variant `chr1-10991221:(DUP00000246)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:10991222-10994549 -><DUP>
I expect the VCF to contain a symbolic duplication with DUP00000246
identifier that has SVLEN=0
in the INFO field. This looks odd since the coordinates 1:10,991,222-10,994,549
indicate presence of ~3.3kb duplication. Therefore, the field should be something like SVLEN=3326
.
This is because the definition of the SVLEN
info field includes the following:
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
*One value for each ALT allele. Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g. deletions) have negative values.
So, again, SVLEN
should be positive for a duplication, where the ALT allele is longer.
Moreover, Delly seems to use the SVLEN
field for another purpose, just to store the length of an insertion:
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Insertion length for SVTYPE=INS.">
However, SVLEN
is a reserved VCF field, so it should be used for its purpose - to store the length difference for all symbolic variants, not just for insertions, and put some random trash for other variants.
I am not sure that SvAnna code base is the place to fix these errors. Hopefully, Delly authors will fix this bug and produce valid VCF files.
So, to fix this in the short term, you'll probably need to write a Python script to set the SVLEN
field with a correct value calculated from the coordinates, and run the script as part of your pipeline, right after Delly variant calling. It should be possible to calculate the coordinates from the POS
and END
fields for all symbolic variants except for INS
.
I can help with checking the script, I've been staring at variant coordinates long enough to develop some skills..
Please let me know if I can help.
Hi there,
We would like to process vcf outputs from the caller Delly with SvAnna, if possible. Below is a subset of the errors we encounter:
And the header + a few lines of calls from Delly are below:
Let us know if this would be possible, and what additional information you need from us. Thanks!