seqan / iGenVar

The official repository for the iGenVar project.
BSD 3-Clause "New" or "Revised" License
9 stars 8 forks source link

[BUG] The commit 91ec8dc "Add iGenVar_SVLEN" brought a bug. #215

Closed Irallia closed 2 years ago

Irallia commented 2 years ago
after running some benchmarks, i found that our score was broken: old new
iGenVar_only-results DUP_as_INS all iGenVar_only-results DUP_as_INS all

After looking at the truth set and the changes, I think the error is that DEL must have a negative SVLEN. Also, an INS does not have an SVLEN of 0, but the distance on the Read.

$ less data/truth_set/HG002_SVs_Tier1_v0.6.vcf | grep "SVLEN=-" | grep "SVTYPE=DEL" | wc -l
37412
$ less data/truth_set/HG002_SVs_Tier1_v0.6.vcf | grep "SVLEN=" | grep "SVTYPE=DEL" | wc -l
37412
$ less data/truth_set/HG002_SVs_Tier1_v0.6.vcf | grep "SVLEN=0" | grep "SVTYPE=INS" | wc -l
0

VCF 4.3 Specification: "The following INFO keys are reserved for encoding structural variants. In general, when these keys are used by imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each allele (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles)."