hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
969 stars 242 forks source link

Hail does not properly support Haploid genotype calls #13149

Open danking opened 1 year ago

danking commented 1 year ago

What happened?

Laura Gauthier: I'm struggling with some DRAGEN data that probably doesn't quite meet the VCF spec. I got the import working, but once I go to split multi-allelics, one of the annotations seems to be the wrong length because I get an array index out of bounds exception. Is there anyway to get more info on the variant that's causing the problem? VCFtool validator found a bunch of issues with FORMAT annotations and I've turned them all into count=1 strings, but there must be something else. ... Tim Poterba (he/him): yeah, the answer is that this isn't a parse failure, it's a failure of the split_multi_hts method to support haploid sex chromosome calls Tim Poterba (he/him): the right plan is to support sex chromosomes The Right Way™ and update all of Hail to infer, track, and use appropriate ploidy but that's not at all what the system looks like right now

Version

0.2.117

Relevant log output

No response

chrisvittal commented 1 year ago

Short term, we should fix split_multi_hts handling for PL to handle at least ploidy 1 and 2.