monarch-initiative / genophenocorr

Genotype Phenotype Correlation
https://monarch-initiative.github.io/genophenocorr/stable
MIT License
4 stars 1 forks source link

Allow the user to limit the tested phenotypic features #97

Open ielis opened 7 months ago

ielis commented 7 months ago

It is better to test only a smaller number of phenotypic features to decrease the false discovery rate and mitigate the impact of multiple testing correction. Doing less tests is better for statistics and for the environment!

I think there are 2 things that need to be done here. First, we need to present counts of phenotypic features. We have a function that does that:

cohort: Cohort = ...

cohort.list_all_phenotypes()

This is good basic functionality, and we can add more convenience if we add term labels and (maybe) even return as a pandas DataFrame:

label term_id count
Seizure HP:0001250 10
Hepatosplenomegaly HP:0001433 4
Arachnodactyly HP:0001166 3
... ... ...

The frame could possibly also break down the count` to the number of direct and indirect (implied by the annotation propagation rule) annotations.

Second, we need to add filter the phenotypic features prior running analysis. We already do one such filtering using min_perc_patients_w_hpo configuration option. We may want to add another filter that lets the user choose a set of HPO terms to test. Note, the ancestors of these terms will not be tested!

We need to think how to do this.

Related to #44 , #98

ielis commented 4 months ago

Related to #126