This issue will be broken down into further chunks while we continue to grow our understanding of PhEval. It serves as the description of a "starter" project which I estimate to take around 2 months to complete.
Some initial research questions:
How does upheno 2 lattice, upheno 2 equivalence, upheno 1 affect semantic similarity scores? Answering this will require a nice characterisation of semantic similarity results over time, possibly involving things like top 100 lost and gained scores, distribution of change difference, etc.
How does upheno 2 equivalence plus ML mappings affect semantic similarity against upheno 2 equivalence without additional mappings? same as above, just slightly different preprocessing.
Provide a cursory analysis (very, very basic) that allows to measure the effect of a specific semantic similarity table on the performance of Exomiser. This is a pretty tough problem to solve, as right now, exomiser needs to be recompiled when for changes to the tables to be reflected. The analysis should be able to visualise changes between different exomiser runs.
Provide a simple python CLI tool with click that runs in the PhEval docker container and wraps pairwise comparisons for 1, 2 and 3 above and spits out the comparative analysis as a markdown document.
Potential pitfalls
I don't know how we could parametrise the Exomiser build process. Probably not a bad idea to consult @julesjacobsen before doing anything to crazy
The semantic similarity features in OAK are under heavy development and may be a bit brittle during this process.
This issue will be broken down into further chunks while we continue to grow our understanding of PhEval. It serves as the description of a "starter" project which I estimate to take around 2 months to complete.
Some initial research questions:
Potential pitfalls