monarch-initiative / owlsim-v3

Ontology Based Profile Matching
17 stars 5 forks source link

match significance stats #3

Open nlwashington opened 10 years ago

nlwashington commented 10 years ago

we need to add p-values statistics to profile matches.

generally, we need to assess the statistical significance of a match (individual x individual, set x set ), in the context of various backgrounds, to make interpretation of matched results useful.

the background probability distributions will generally need to be generated per individual/set that is compared-to. for instance, if we want to know how well disease D matches disease Y, the background could be the distribution of scores for: all other diseases matching to Y or by different kinds of randomly generated diseases (with same overall information matching to Y; same count of annots; distribution of scores; etc.)

the background data generation methods can be ported over from the OwlSim Analysis git; stats methods need to be written fresh here.