malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Allow fractional value for max_missing_an, e.g. 0.01 #572

Open leehart opened 1 month ago

leehart commented 1 month ago

E.g. with regards to pca()

Re: PR https://github.com/malariagen/malariagen-data-python/pull/569

This does also highlight a general weakness with setting max_missing_an=0 as default for all datasets. In real datasets, as the number of samples gets larger, so the chance that you'll find variants with no missingness at all get smaller.

Rather it would be better to set a fractional value, i.e., to allow something like max_missing_an=0.01 which would mean keep all variants with at most 1% missing genotype calls.