Closed simojoe closed 3 months ago
I agree that the current behavior is unintuitive / basically incorrect. Your first proposal seems reasonable to me. In general, would you expect an unbounded INFO field to remain unsplit?
@simojoe I opened a PR to resolve this issue. If you have a moment, let me know if it meets your needs.
Use case : I am parsing a VCF file with
glow
and am using theSplitMultiallelics
transformer to obtain biallelic entries. One of my INFO fields is unbounded, meaning that it is defined withNumber=.
in the VCF header.Unexpected behaviour : When splitting, the unbounded INFO is sometimes left intact, sometimes separated into the different alternate alleles.
Due to being unbounded, the said INFO field will have a variable number of elements for each VCF entries. When that number is equal to the number of alternate alleles, the
SplitMultiallelics
transformer will split it into individual parts, but it will keep it intact if it is not equal.It seems to me that this is an edge case, because one would expect an INFO field to be either split or intact throughout a single file, not both.
Possible solutions :
Number=A
to indicate that the number of elements is the same as the number of alternate alleles. The transformer could allow splitting only ofNumber=A
fields.