we need to add p-values statistics to profile matches.
generally, we need to assess the statistical significance of a match (individual x individual, set x set ), in the context of various backgrounds, to make interpretation of matched results useful.
the background probability distributions will generally need to be generated per individual/set that is compared-to. for instance, if we want to know how well disease D matches disease Y, the background could be the distribution of scores for: all other diseases matching to Y or by different kinds of randomly generated diseases (with same overall information matching to Y; same count of annots; distribution of scores; etc.)
the background data generation methods can be ported over from the OwlSim Analysis git; stats methods need to be written fresh here.
we need to add p-values statistics to profile matches.
generally, we need to assess the statistical significance of a match (individual x individual, set x set ), in the context of various backgrounds, to make interpretation of matched results useful.
the background probability distributions will generally need to be generated per individual/set that is compared-to. for instance, if we want to know how well disease D matches disease Y, the background could be the distribution of scores for: all other diseases matching to Y or by different kinds of randomly generated diseases (with same overall information matching to Y; same count of annots; distribution of scores; etc.)
the background data generation methods can be ported over from the OwlSim Analysis git; stats methods need to be written fresh here.