malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Exclude samples with unassigned taxon from advanced allele frequencies cohorts #491

Closed alimanfoo closed 7 months ago

alimanfoo commented 7 months ago

In the snp_allele_frequencies_advanced() and aa_allele_frequencies_advanced() functions, cohorts of samples with unassigned taxon can be formed if there are enough samples with "unassigned" taxon values.

This can occasionally manifest as test failures, e.g.:

image

Above the "ML-2_unas_2015" cohort has samples with unassigned taxon.

It would be better to exclude samples with unassigned taxon.