Open nicolevasilevsky opened 4 years ago
I've moved the main discussion for this issue here: https://github.com/INCATools/ontology-development-kit/issues/482
Put label 'needs clarification' on this for now until some questions in that issue addressed.
This problem actually needs to be solved here in the Mondo repo, as this is very specific to Mondo, and ROBOT/ODK extensions apply to all ontologies! You are thinking in the right direction though.
What needs to happen here is:
1) Inputs: tmp/mondo-labels.tsv, tmp/hpo-labels.tsv (using ROBOT query, and a sparql query that obtains a list of IDs with their labels (that query already exists in the repo). These to make
goals do not exist yet and need to be created in mondo.Makefile
2) a python script that reads the two tables, does some preprocessing on the labels and then spits put an error if two labels clash.
A more amazing, alternative solution (which I dont know if and how it would work, but could be a challenge) 1) you merge mondo.owl and hp.owl 2) write a single SPARQL query that checks for duplicate labels.. after preprocessing (this requires some pretty advanced SPARQL skills a hand-width beyond my own though)
Just copying over my questions from here. I refined my questions based on Nico's last comment. Maybe another one to add to the qc-call
.
@matentzn
src/ontology/tmp/
directory, but those files are .gitignore
ed. I might be misunderstanding, but it sounds like you are saying that I need to create some make
targets in order to create these files first and then use them as inputs? And I think the python scripts should go in src/scripts
.The issue here is that we have terms in Mondo that are diseases and sometimes there are terms in Human Phenotype Ontology (HPO) that have the same label. Examples: MONDO_0001627 dementia and HP:0000726 Dementia MONDO_0002280 anemia and HP:0001903 Anemia
You can view the entire HPO here: https://hpo.jax.org/app/ (or download the owl file here
I can't remember what our ultimate goal is though - do we not want to have overlapping terms in Mondo and HPO?
I'll add this to the QC call agenda.
For now, the action item should be:
As I describe in my last comment above, you can generate the tables using ROBOT report - @sabrinatoro is also now expert and can help you do it :D
2) best directory - should go to into mondo.make file temp files should go to into tmp directory src/scripts
3) ROBOT report is a function inside of ROBOT tool to generate a standardized QC that is the same across all OBO ontologies
The point of this ticket is some diseases match phenotypes - we need an exact match table. Then we'll discuss what to do with these on the Mondo call.
5) where the label (after stemming, lowercasing, removing abnormal) matches. Could start by looking at exact matches. stemming is a preprocessing step (can google what this means). May not be necessary.
start with case insensitive matches
Not sure whether it is related to this issue, but e.g. the following MONDO concepts share a definition with HPO concepts: MONDO:0005260, MONDO:0001014.
It fits in here! But its a slightly adjacent issue. Thank your for reporting!
Per our Mondo call on 05/08, could we get a check in the ROBOT report to check that Mondo terms do not have any clashes with HPO terms.
For example: MONDO_0001627 'dementia (disease)' MONDO_0002280 'anemia (disease)'