EPIC: Set up a basic comparative analysis for PhEval

matentzn commented 2 years ago

This issue will be broken down into further chunks while we continue to grow our understanding of PhEval. It serves as the description of a "starter" project which I estimate to take around 2 months to complete.

Some initial research questions:

How does upheno 2 lattice, upheno 2 equivalence, upheno 1 affect semantic similarity scores? Answering this will require a nice characterisation of semantic similarity results over time, possibly involving things like top 100 lost and gained scores, distribution of change difference, etc.
How does upheno 2 equivalence plus ML mappings affect semantic similarity against upheno 2 equivalence without additional mappings? same as above, just slightly different preprocessing.
Provide a cursory analysis (very, very basic) that allows to measure the effect of a specific semantic similarity table on the performance of Exomiser. This is a pretty tough problem to solve, as right now, exomiser needs to be recompiled when for changes to the tables to be reflected. The analysis should be able to visualise changes between different exomiser runs.
Provide a simple python CLI tool with click that runs in the PhEval docker container and wraps pairwise comparisons for 1, 2 and 3 above and spits out the comparative analysis as a markdown document.

Potential pitfalls

I don't know how we could parametrise the Exomiser build process. Probably not a bad idea to consult @julesjacobsen before doing anything to crazy
The semantic similarity features in OAK are under heavy development and may be a bit brittle during this process.

matentzn commented 2 years ago

@souzadevinicius (to remember your GH handle)

matentzn commented 1 year ago

@souzadevinicius

[x] follow repo https://github.com/Knowledge-Graph-Hub/semsim and
[ ] Salvage all you can from https://github.com/Knowledge-Graph-Hub/semsim/blob/main/notebooks/plot_old_versus_new_phenodigm_resnik_and_jaccard.ipynb

matentzn commented 1 year ago

pheval-utils compare-semsim --left profile1.tsv --right profile2.tsv --output results.json

monarch-initiative / pheval

EPIC: Set up a basic comparative analysis for PhEval #2

Some initial research questions:

Potential pitfalls