monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
236 stars 54 forks source link

ROBOT report - need to check that we have no clashes with HPO #1499

Open nicolevasilevsky opened 4 years ago

nicolevasilevsky commented 4 years ago

Per our Mondo call on 05/08, could we get a check in the ROBOT report to check that Mondo terms do not have any clashes with HPO terms.

For example: MONDO_0001627 'dementia (disease)' MONDO_0002280 'anemia (disease)'

joeflack4 commented 3 years ago

I've moved the main discussion for this issue here: https://github.com/INCATools/ontology-development-kit/issues/482

Put label 'needs clarification' on this for now until some questions in that issue addressed.

matentzn commented 3 years ago

This problem actually needs to be solved here in the Mondo repo, as this is very specific to Mondo, and ROBOT/ODK extensions apply to all ontologies! You are thinking in the right direction though.

What needs to happen here is: 1) Inputs: tmp/mondo-labels.tsv, tmp/hpo-labels.tsv (using ROBOT query, and a sparql query that obtains a list of IDs with their labels (that query already exists in the repo). These to make goals do not exist yet and need to be created in mondo.Makefile 2) a python script that reads the two tables, does some preprocessing on the labels and then spits put an error if two labels clash.

A more amazing, alternative solution (which I dont know if and how it would work, but could be a challenge) 1) you merge mondo.owl and hp.owl 2) write a single SPARQL query that checks for duplicate labels.. after preprocessing (this requires some pretty advanced SPARQL skills a hand-width beyond my own though)

joeflack4 commented 3 years ago

Just copying over my questions from here. I refined my questions based on Nico's last comment. Maybe another one to add to the qc-call.

@matentzn

nicolevasilevsky commented 3 years ago

The issue here is that we have terms in Mondo that are diseases and sometimes there are terms in Human Phenotype Ontology (HPO) that have the same label. Examples: MONDO_0001627 dementia and HP:0000726 Dementia MONDO_0002280 anemia and HP:0001903 Anemia

You can view the entire HPO here: https://hpo.jax.org/app/ (or download the owl file here

I can't remember what our ultimate goal is though - do we not want to have overlapping terms in Mondo and HPO?

I'll add this to the QC call agenda.

matentzn commented 3 years ago

For now, the action item should be:

As I describe in my last comment above, you can generate the tables using ROBOT report - @sabrinatoro is also now expert and can help you do it :D

nicolevasilevsky commented 3 years ago

2) best directory - should go to into mondo.make file temp files should go to into tmp directory src/scripts

3) ROBOT report is a function inside of ROBOT tool to generate a standardized QC that is the same across all OBO ontologies

The point of this ticket is some diseases match phenotypes - we need an exact match table. Then we'll discuss what to do with these on the Mondo call.

5) where the label (after stemming, lowercasing, removing abnormal) matches. Could start by looking at exact matches. stemming is a preprocessing step (can google what this means). May not be necessary.

start with case insensitive matches

psiotwo commented 2 years ago

Not sure whether it is related to this issue, but e.g. the following MONDO concepts share a definition with HPO concepts: MONDO:0005260, MONDO:0001014.

matentzn commented 2 years ago

It fits in here! But its a slightly adjacent issue. Thank your for reporting!