kipoi / kipoiseq

Standard set of data-loaders for training and making predictions for DNA sequence-based models.
https://kipoi.org/kipoiseq/
MIT License
77 stars 13 forks source link

gt_types vs. genotypes-attribute for variant (extractor-MultiSampleVCF) #25

Closed JulianRein closed 5 years ago

JulianRein commented 5 years ago

MultiSampleVCF uses the gt_types-attribute of variants to determine genotype. return variant.gt_types[self.sample_mapping[sample_id]] != 0 However, this seems to have the value 2 for the ./. "genotype" in vcf-files, e.g. for SNP 1_10240_C_CT_b37. Do we want to regard "./." as a valid variant?

MuhammedHasan commented 5 years ago

Thanks for reporting the bug. UNKNOWN should not be valid variant.

gts012 (bool) – if True, then gt_types will be 0=HOM_REF, 1=HET, 2=HOM_ALT, 3=UNKNOWN. If False, 3, 2 are flipped.

http://brentp.github.io/cyvcf2/docstrings.html