What is the meaning of quality metrics in Monarch website?

monicacecilia commented 5 years ago

Question from Monarch user Monique van der Voet.


Hi! I have a question about some of the output of the Monarch Initiative system.
I come across the term "score", which seems to go 1–100 describing how similar two terms are (100 if identical), but I can't locate documentation on how this score is calculated or how it should be interpreted. If two seemingly distant phenotypes have a similarity score of 55, would that be high or low? Some insight in the background would be helpful.
I have a similar question for the IC score, also there I can't find documentation. Is this the negative log likelihood?
I hope you can clarify this for me and possibly include this information on the website.
Kind regards,
Monique

pnrobinson commented 5 years ago

@monicacecilia Can we add this to a new FAQ page? Do we already have FAQs anywhere?

monicacecilia commented 5 years ago

We don't actually have one. I discovered that yesterday and commented out of this growing ticket on feedback mechanisms https://github.com/monarch-initiative/monarch-ui/issues/109

I agree that starting to build an FAQ page would be very useful. I bet @iimpulse would be able to whip one out in no time. :bowtie:

Peter, I would love it if you could please help us with the response.

pnrobinson commented 5 years ago

@cmungall I think Chris' software is calculating this score...

cmungall commented 5 years ago

@matentzn sorry to pass this on but i think yr already handling this

matentzn commented 5 years ago

@kshefchek for the website similarity scores, do you currently use the owltools -owlsim phenodigm scores? Jaccard or IC or both? If its phenodigm, you can point to the paper. IC part of question (from the paper, if Kent confirms):

In any case, this is a great question which I think we can answer quickly now, but we need to answer properly in the long term. There is absolutely no evaluation that I know of that relates semantic similarity to phenotypic similarity, the latter of which is really hard to quantify given that we don't even know its components (anatomical homology, taxonomic distance, and I am sure many many more).

So I would say the quick answer is this:

The semantic similarity scores we use it approximate phenotypic similarity in the following ways:
1. If the score is 1, the phenotypes are equal (disregarding taxonomic distance)
2. If the score is 0, the phenotypes have no relationship
3. Anything in between means they are in some way similar, i.e. they share more or less distant ontology parents.

The score is currently computed by a mix of semantic similarity and heuristic approaches (i.e. label matching), which are documented [here](https://github.com/obophenotype/upheno/tree/master/mappings) and the above mentioned phenodigm paper.

Anything else is up for interpretation. A score of 0.55 can mean in some cases strong phenotypic similarity, which means that the area is undermodelled ontologically (important axes of classification or links are missing); It can mean nothing or extreme difference if the area is overmodelled. Matt Brush has some cool ideas on how injecting axes of classification into an ontology severely affects semantic similarity scores, without helping us understand phenotypic distance.

matentzn commented 5 years ago

Hey @monicacecilia, here is some text for your answer.

Thank you for your interest! It definitely helped to up our priority to publish better documentation on our phenotypic similarity data!

The scores are generated by the modified phenodigm algorithm which is based on information content (IC), as documented here. The implementation we use is the one from the owlsim2 package that is shipped as part of owltools. We access this information through Monarchs API, example here. Some more information and links to source code can be found here.

The semantic similarity scores we use approximate phenotypic similarity in the following ways:

If the score is 1, the phenotypes are considered equal (disregarding taxonomic distance in case of cross-species phenotypes, such as MP:absent eye and HP:absent eye)
If the score is 0, the phenotypes have no relationship
Anything in between means they are in some way similar, i.e. they share more or (ontological) information.

The score is currently computed by a mix of semantic similarity and (in the case of cross-species matching) heuristic approaches (i.e. label matching), so this is how they should be treated: as approximations. Having higher or lower scores is more an expression of the data and ontological knowledge that is currently available, so it is hard to say off-hand whether a score of 0.55 is a lot or little - this would depend on the case.

We are currently in the process of isolating our phenotypic similarity data and offering it for download in a FAIR, well-documented fashion! This will also trigger a further range of evaluations, which might shed some more light on the question on how phenodigm scores should be interpreted, in particular for cross-species integration. Please keep an eye on twitter, were we will announce the datasets, hopefully by late fall.

Thank you for your interest!

monicacecilia commented 5 years ago

Responded.

monarch-initiative / helpdesk

What is the meaning of quality metrics in Monarch website? #1