Consider p value calculation for IC based sim calculations

kshefchek commented 7 years ago

When presenting results I'm often asked for a p value to determine if a match is significant. @drseb has proposed a way to generate p values for similarity scores here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2756558/. Would it be feasible and useful to add this to the phenodigm algorithm? Could we also add phenomizer as a matcher?

drseb commented 7 years ago

It would be possible to add phenomizer, but I suggest to use the bayesian algorithms, as these will naturally give you a statistical statement. Happy to help with the empirical p-values if you decide to go this route

cmungall commented 7 years ago

Agreed about the bayesian algorithms, but just to be clear these yield a probability not a p-value.

For calculation of p-values there is this code added by Nicole I think:

https://github.com/monarch-initiative/owlsim-v3/blob/a4a9c23208b2924caf48d363a02700258f9b49c1/owlsim-core/src/main/java/org/monarchinitiative/owlsim/model/match/impl/MatchSetImpl.java#L211

But this is a T-Test and not meaningful here.

To get accurate p-values we can follow the methods in @drseb's paper but we would have to do the simulation for all combinations of species I think.

monarch-initiative / owlsim-v3

Consider p value calculation for IC based sim calculations #80