samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
276 stars 244 forks source link

Extended Attributes in Genotype not parsed #1502

Open BoscoSuen opened 3 years ago

BoscoSuen commented 3 years ago

When parsing VCF record with extended genotype, eg:

FORMAT  NA00001 NA00002 NA00003
GT:GQ:DP:HQ 1|2:21:6:23,27  2|1:2:0:18,2

HQ field is an extended attribute, the decode result is "23,27", a single string with 2 values and 1 comma. Thus we cannot call some functions to get attributes as Int/Float/Boolean. I wonder if we need to manually parse those extended attributes?

lbergelson commented 3 years ago

@BoscoSuen Sorry for the slow response. This is a weird deficiency in the API which we should fix. I don't think there is any good reason there aren't getters for the genotype attributes as Lists of int/float/etc like you can for the VariantContextAttributes. It's been requested before but no one has contributed it yet and I haven't had the bandwidth to do it myself yet.