Closed ielis closed 7 months ago
@lnrekerle
I'm proposing a revamp to the CohortAnalyzer
.
The CohortAnalyzer
is an abstraction - a promise what CohortAnalyzer
can do for the user. To get CohortAnalyzer
we use a similar pattern to configuring PhenopacketPatientCreator
. There is a config method that will give you CohortAnalyzer
:
from genophenocorr.analysis import configure_cohort_analysis
analysis = configure_cohort_analysis(cohort, hpo)
You'll get an analysis with default options. If you want to tweak the options, build the CohortAnalysisConfiguration
:
from genophenocorr.analysis import CohortAnalysisConfiguration
configuration = CohortAnalysisConfiguration.builder()
.include_sv(True)
.pval_correction('fdr_bh')
.build()
analysis = configure_cohort_analysis(cohort, hpo, configuration)
Then we run the analysis, e.g. to compare MISSENSE vs others:
from genophenocorr.model import VariantEffect
from genophenocorr.analysis.predicate import BooleanPredicate
results = analysis.compare_by_variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_1234.5')
result_df = results.summarize(hpo, BooleanPredicate.YES)
result_df.head()
We get results
, a container with a lot of data. We call summarize to prepare a data frame with phenotypes vs. genotypes, ordered by corrected p values.
Note that we provide BooleanPredicate.YES
to show genotype-phenotype correlation for present HPO terms, not for not-present (we would use BooleanPredicate.NO
to show those).
This is what the PR adds. Thanks to the changes, we have a general framework for applying genotype and phenotype predicates and showing the results.
Please check out the code, try it out and we can discuss in greater detail the next time.
Now, with the develop
merged into the PR branch, we should be OK to move forward with this PR if the code looks good.
Fixes #87 , #92
Depends on #94