obophenotype / upheno

The Unified Phenotype Ontology (uPheno) integrates multiple phenotype ontologies into a unified cross-species phenotype ontology.
https://obophenotype.github.io/upheno/
Creative Commons Zero v1.0 Universal
75 stars 17 forks source link

Publicly recommend that phenotype ontologies should all have a root phenotype class in their own namespace #287

Open matentzn opened 5 years ago

matentzn commented 5 years ago

I would like to publicly recommend that all phenotype ontologies have their own

X:Phenotype class, and require that all definitions are extended to

X:Phenotype and has_part some {...}

I would also like to recommend to subclass one of the UPHENO classes (either Phenotype or Abnormal phenotype) to make alignments a bit easier (i.e. X:Phenotype sub UPHENO:0001002).

The reason for that is that we need to start thinking how to make the taxon restrictions explict. I know this will be a bit major, as a lot of HP <- MP subsumptions will just disappear (though, if we play it right, fewer of the MP <- HP ones). But the way this is handled right now is just wrong IMO. We will have everywhere that Melissas ulcerated paw in the mouse is THE SAME as a humans thickened hand skin; this will then cause all childrens of thickened hand skin to be children of ulcerated paw which may have lots of weird consequences on the systems using them together. I am sure there will be many arguments against it, so lets hear them.

balhoff commented 5 years ago

+1! I have run into issues with taxon scoping before:

https://github.com/obophenotype/mammalian-phenotype-ontology/issues/2140

pnrobinson commented 5 years ago

Can you explain what the practical problem is (it is a philosophical problem whether ulcerated paws and thickened hand skin are "the same" any less or more than thickened hand skin in two different humans -- but this is just a model). Does this affect the calculations that we are doing for Exomiser and similar programs? Could you give an example? It seems that if we lose these subsumptions, then we will reduce the performance of our software in order to gain a better philosophical model, and so I do not think it is obvious that we want this.

matentzn commented 5 years ago

Hey Peter, thanks for chipping in. Very good questions. If you want real examples, it will have to wait until after ICBO, I will set up some experiments. But consider these scenarios:

1) I want to query UPheno and ONLY get mouse or zebrafish (currently not possible) endocrine system phenotypes - you will always get human, worm and other stuff back, no matter what you do (an alternative would be to encode the taxon restriction in UBERON of course, that would be the same thing). I cant speak for Scigraph etc, because they might pull a taxon restriction from elsewhere. I am talking only about the ontology itself. 2) For external applications: if ZP:001 and MP:002 are equivalent (for example because they have the same definition), they will show up with similarity 1, while "Abnormality in digit 1 growth" and " Abnormality in digit 2 growth" of the human will be less than 1. Maybe that is correct according to your homology theory. So maybe thats ok. 3) But imagine someone in ZP saying (which they are allowed to, even if we don't want it) that everyone that exhibits phenotype ZP:001 also is a fish and therefore has fins (this is correct in their model); Asking UPHENO: What do we know of individuals that exhibit MP:002? Would result in the answer: They are fish and have fins.

I hope that answers the question for now, but I will find you some better examples later this year. I understand that some people do not think of ontologies as inputs to reasoning engines; but fact is, they are used as such (I recently helped MGI to implement a search engine using MP, HP and reasoning). So I think its important to consider.

pnrobinson commented 5 years ago

Hi Nico -- I realize that the above are issues, but I am saying that software can take care of them and that changing the logical structure of the definitions has other disadvantages and we need to weigh them. In any case, one of our main use cases is Exomiser and so we should not reduce this performance in exchange for something that is less concrete.

drseb commented 5 years ago

Not sure I totally understood all issues in this thread, but I think you will have to ensure that the species-specific classes that are now equivalent have a common superclass that is only subsumed by those classes (and not more). Otherwise this will introduce noise. For example there should be a species-agnostic "ulcerated autopod" that is subsumed by all the ulcerated paws/hands/fins. If I am not totally wrong the result for IC-based scores will then stay the same!? Does that make sense?

matentzn commented 5 years ago

I appreciate your concern @pnrobinson! We will put this on ice until I can guarantee you that the dependent toolchains do not drop in performance - That, of course, has priority!

@drseb I will look into this issue as well. It is not strictly speaking necessary to have species independent NAMED classes; they could be generated on the fly prior to determining the semantic similarity, and then, yes, one would think the results should be the same, or at least very similar.

Are there any other concerns apart from failing tools that depend on the ontology?

cmungall commented 5 years ago

See also #171

obviously this change needs careful coordination

matentzn commented 5 years ago

@balhoff @dosumis @pnrobinson @drseb @cmungall @sbello

For the next steps in the development of Upheno, I would like to move this issue forward. Could you guys please list the stakeholders (i.e. tools, and people) that should be consulted before we can make this decision? I suggest we call a meeting and I will prepare a little presentation outlining what I have in mind.

As far as I can see, we can induce a gradual change that should not affect any current tooling like this: 1) Every phenotype ontology has its own "Phenotype" class, for example "Worm Phenotype", "Zebrafish Phenotype", "Human Phenotype", "Mammalian Phenotype". 2) In Upheno, we will have that all of the above are subclasses of "Upheno Phenotype" 3) All phenotypes are defined to be "Worm Phenotype" and has part some (Q in E).

Now, this will change the results of semantic similarity. Which we don't want until we are sure the change is a change for the better.

4) So, until we can be sure of the above, we can simply assert all species-specific phenotype root classes (i.e. the ones added in step 1) to be equivalent (in Upheno). This should create the EXACT situation that we have at the moment (i.e., all subclasses stay the same).

The idea is then in the long run, to move to a situation where phenotypes are siblings under a common Upheno class.

I suggest we discuss everything further in person. I know some of you are worried of effects on downstream tooling, but I strongly (!) believe this is going to be worth it.