Open jeromekelleher opened 2 years ago
This would be useful to add.
Should it be required to specify this to indicate default phased-ness or would it be permitted to have an array full of true values (to indicate phased) or false (unphased)? In terms of implementation, when converting a VCF file we don't in general know if it is phased or not, so we'd have to generate the phased array, and then throw it away if all entries were true or false.
Would it be an error to specify both the attribute and the array?
Hmm, it is tricky all right. I guess in retrospect the actual amount of storage required for an array of all 0s or all 1s is going to be pretty small, so perhaps it's not worth worrying about this. If we start summarising this at the file-level then why not summarise a bunch of other things.
Currently we assume genotypes are unphased if the
phased
marker isn't present. However, it's a pretty common case I'd imagine that all genotypes are either phased or unphased, so requiring the extra storage in the phased case seems excessive. Also, we don't want to have to go through everything to see if the data is all phased.So, how about we have a top-level field which tells us the default phased-ness?