projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
265 stars 111 forks source link

schema for VEP positional INFO fields incorrect for indels #389

Closed williambrandler closed 2 years ago

williambrandler commented 3 years ago

The following fields for VEP are observed as IntegerType for SNPs, but for indels can be StringType

"cDNA_position" -> IntegerType, "CDS_position" -> IntegerType, "Protein_position" -> IntegerType This is because indels are specificed as a range of positions, for example, "48-49" A consequence is that any VCF rows with indels will have their VEP INFO fields read in as "null"

https://github.com/projectglow/glow/blob/8b0bcd6b2f7320c3a5bd186bdcfa4707af303b47/core/src/main/scala/io/projectglow/vcf/AnnotationUtils.scala#L58