sgkit-dev / sgkit-plink

NOW ARCHIVED! PLINK IO implementations for sgkit. With https://github.com/pystatgen/sgkit/issues/65, now maintained and developed in the sgkit repo.
Apache License 2.0
0 stars 4 forks source link

Standardize missing value sentinels for string arrays #16

Open eric-czech opened 4 years ago

eric-czech commented 4 years ago

Before switching to fixed length string dtypes for sample/variant metadata, None was an appropriate sentinel for missing values. This won't work for fixed length types though so read_plink should use empty strings instead (the None values are currently being coerced to "None").

I would rather not alter the values in the PLINK fam/bim files at all, but string "0" as a missing value sentinel won't be a convention we use anywhere else in sgkit, so it is worth coercing these to empty strings so users can expect a uniform representation for missing values in all string arrays.