obophenotype / uberon

An ontology of gross anatomy covering metazoa. Works in concert with https://github.com/obophenotype/cell-ontology
http://obophenotype.github.io/uberon/
Other
134 stars 29 forks source link

Identify all proxy merges (mappings) and make issues on external ontology trackers #2834

Open matentzn opened 1 year ago

matentzn commented 1 year ago

We have a bunch of cases like:

Uberon:nerve --[sempav:crossSpeciesExactMatch]--> Xenopus:nerve 
Uberon:nerve --[sempav:crossSpeciesExactMatch]--> Xenopus:peripheral_nerve 

This means that an external ontology considered two concepts distinct that Uberon believes are the same.

As @cmungall commented in #2833, we should probably review these proxy merges and make issue tracker items for all of these, encouraging the external ontologies (like in this case XAO) to merge the terms.

gouttegd commented 1 year ago
Uberon:nerve --[sempav:crossSpeciesExactMatch]--> Xenopus:nerve 
Uberon:nerve --[sempav:crossSpeciesExactMatch]--> Xenopus:peripheral_nerve

Where do those semapv:crossSpeciesExactMatch come from?

As far as I know the Uberon/XAO mappings are still expressed as oboInOwl:hasDbXref, they have not switched to SSSOM and SEMAPV properties yet.

Anyway, while it could be that merging the two XAO terms upstream is the appropriate solution in this particular case, I want to point out that SSSOM and SEMAPV will offer a greater flexibility in this kind of cases.

Currently, the meaning of cross-species mappings (and so the type of bridging axioms) is decided once and for all for every taxon-specific ontology. In the case of XAO for example, we have this declaration:

treat-xrefs-as-reverse-genus-differentia: XAO part_of NCBITaxon:8353

which means that any XAO term mapped to an Uberon term is equivalent to the Uberon term (with the added restriction on taxon). Subsequently, any two XAO terms that are mapped to the same Uberon term (as in the example above) would end up being equivalent.

But with SSSOM and SEMAPV, we no longer need to treat all mappings with a given foreign ontology as if they were all of the same type. We can use semapv:crossSpeciesExactMatch for mappings that are indeed supposed to be between equivalent terms (again modulo the taxon restriction), and other mapping properties for mappings between terms that should not be equivalent.

For example, we could have

Uberon:nerve --[sempav:crossSpeciesCloseMatch]--> Xenopus:nerve 
Uberon:nerve --[sempav:crossSpeciesCloseMatch]--> Xenopus:peripheral_nerve

which could be translated as Xenopus:nerve and Xenopus:peripheral_nerve being subclasses of Uberon:nerve, rather than equivalent classes (thereby not leading to an undesired equivalence between the two XAO terms).

Not saying this would be the right thing to do in this particular case (maybe XAO should revise their concepts of nerves, I don’t know), but let’s not be too fast in telling foreign ontologies that they should start merging terms just to keep Uberon happy. SSSOM gives us other solutions that may in some cases be better.

matentzn commented 1 year ago

Where do those semapv:crossSpeciesExactMatch come from?

It was just me projecting into the future. 🔥 They are indeed hasDbXrefs.

Fully agreed with the rest - this is not how bridge generation works right now, but this is exactly what I think we should go towards (ok, I would use broadMatch, not closeMatch most likely, but we get to this when we get to this).

We will have to separate genuine proxy merges from false ones using manual curation, but I am assuming there wont be too many.

github-actions[bot] commented 1 year ago

This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.

github-actions[bot] commented 7 months ago

This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.

gouttegd commented 7 months ago

but I am assuming there wont be too many.

With sssom-cli (included in the latest ODK) you can quickly find out all the cases for a given foreign ontology.

First make sure you have the tmp/uberon-mappings.sssom.tsv file (containing all the mappings extracted from cross-references in Uberon); that file is normally automatically generated as part of the bridge pipeline, but if needed:

sh run.sh make tmp/uberon-mappings.sssom.tsv

Then to get all the XAO terms that are mapped to more than one Uberon/CL term:

sh run.sh sssom-cli --prefix-map-from-input \
   -i tmp/uberon-mappings.sssom.tsv -i mappings/cl-mappings.sssom.tsv \
   -R '!object==XAO:* -> stop()' -R 'cardinality==1:n' -> include()'

This will (1) read the Uberon mappings (re-generated above) and the CL mappings (already stored in the repository in mappings/cl-mappings.sssom.tsv), (2) filters out any mapping whose object is not a XAO term (!object==XAO:* -> stop()), and (3) out of the remaining mappings, selects only those with a cardinality of 1:n, meaning many objects mapped to the same subject (cardinality==1:n).

Output:

#curie_map:
#  UBERON: "http://purl.obolibrary.org/obo/UBERON_"
#  XAO: "http://purl.obolibrary.org/obo/XAO_"
#mapping_set_id: "http://purl.obolibrary.org/obo/uberon/core/mappings.sssom.tsv"
#license: "http://creativecommons.org/licenses/by/3.0/"
subject_id      subject_label   predicate_id    object_id       mapping_justification
UBERON:0001021  nerve   semapv:crossSpeciesExactMatch   XAO:0000204     semapv:UnspecifiedMatching
UBERON:0001021  nerve   semapv:crossSpeciesExactMatch   XAO:0003047     semapv:UnspecifiedMatching
UBERON:0001675  trigeminal ganglion     semapv:crossSpeciesExactMatch   XAO:0000427     semapv:UnspecifiedMatching
UBERON:0001675  trigeminal ganglion     semapv:crossSpeciesExactMatch   XAO:0000428     semapv:UnspecifiedMatching
UBERON:0001785  cranial nerve   semapv:crossSpeciesExactMatch   XAO:0000429     semapv:UnspecifiedMatching
UBERON:0001785  cranial nerve   semapv:crossSpeciesExactMatch   XAO:0003089     semapv:UnspecifiedMatching
UBERON:0002100  trunk   semapv:crossSpeciesExactMatch   XAO:0000054     semapv:UnspecifiedMatching
UBERON:0002100  trunk   semapv:crossSpeciesExactMatch   XAO:0003025     semapv:UnspecifiedMatching
UBERON:0003071  eye primordium  semapv:crossSpeciesExactMatch   XAO:0000227     semapv:UnspecifiedMatching
UBERON:0003071  eye primordium  semapv:crossSpeciesExactMatch   XAO:0004090     semapv:UnspecifiedMatching
UBERON:0004535  cardiovascular system   semapv:crossSpeciesExactMatch   XAO:0000100     semapv:UnspecifiedMatching
UBERON:0004535  cardiovascular system   semapv:crossSpeciesExactMatch   XAO:0001010     semapv:UnspecifiedMatching
UBERON:0005487  vitelline vein  semapv:crossSpeciesExactMatch   XAO:0000376     semapv:UnspecifiedMatching
UBERON:0005487  vitelline vein  semapv:crossSpeciesExactMatch   XAO:0004147     semapv:UnspecifiedMatching
UBERON:0005870  olfactory pit   semapv:crossSpeciesExactMatch   XAO:0000275     semapv:UnspecifiedMatching
UBERON:0005870  olfactory pit   semapv:crossSpeciesExactMatch   XAO:0004073     semapv:UnspecifiedMatching

Here you are. Eight cases in XAO.

Likewise for ZFA:

sh run.sh sssom-cli --prefix-map-from-input \
   -i tmp/uberon-mappings.sssom.tsv -i mappings/cl-mappings.sssom.tsv \
   -R '!object==ZFA:* -> stop()' -R 'cardinality==1:n -> include()'
#curie_map:
#  UBERON: "http://purl.obolibrary.org/obo/UBERON_"
#  ZFA: "http://purl.obolibrary.org/obo/ZFA_"
#mapping_set_id: "http://purl.obolibrary.org/obo/uberon/core/mappings.sssom.tsv"
#license: "http://creativecommons.org/licenses/by/3.0/"
subject_id      subject_label   predicate_id    object_id       mapping_justification
UBERON:0000165  mouth   semapv:crossSpeciesExactMatch   ZFA:0000547     semapv:UnspecifiedMatching
UBERON:0000165  mouth   semapv:crossSpeciesExactMatch   ZFA:0000590     semapv:UnspecifiedMatching
UBERON:0001230  glomerular capsule      semapv:crossSpeciesExactMatch   ZFA:0005254     semapv:UnspecifiedMatching
UBERON:0001230  glomerular capsule      semapv:crossSpeciesExactMatch   ZFA:0005310     semapv:UnspecifiedMatching
UBERON:0001286  Bowman's space  semapv:crossSpeciesExactMatch   ZFA:0005283     semapv:UnspecifiedMatching
UBERON:0001286  Bowman's space  semapv:crossSpeciesExactMatch   ZFA:0005312     semapv:UnspecifiedMatching
UBERON:0003054  roof plate      semapv:crossSpeciesExactMatch   ZFA:0001436     semapv:UnspecifiedMatching
UBERON:0003054  roof plate      semapv:crossSpeciesExactMatch   ZFA:0007058     semapv:UnspecifiedMatching
UBERON:2000089  actinotrichium  semapv:crossSpeciesExactMatch   ZFA:0000089     semapv:UnspecifiedMatching
UBERON:2000089  actinotrichium  semapv:crossSpeciesExactMatch   ZFA:0005435     semapv:UnspecifiedMatching

Five cases in ZFA.

matentzn commented 7 months ago

Very nice analysis @gouttegd!

Do you think we should try to fix all proxy merges (I am assuming they are always an indication of "wrong")? And then add proxy-merge checking explicitly to QC?

I think I could justify passing the proxymerge review to a curator to fix. Maybe Ray or Arwa.

To streamline 1, it would be good if we could run runoak fill-table or some such to add object_label, but I guess they could also learn to do this themselves, now that sssom-java is in ODK.

github-actions[bot] commented 1 month ago

This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.

gouttegd commented 1 month ago

To streamline 1, it would be good if we could run runoak fill-table or some such to add object_label

With SSSOM-Java >= 0.7.7 (in ODK 1.5.2), you can do:

$ sssom-cli -p -i ../mappings/uberon.sssom.tsv \
            --exclude '!object==ZFA:*' \
            --include 'cardinality==1:n' \
            --update-from-ontology=imports/local-zfa.owl:object \
            --catalog none

This will (1) read the mappings, (2) exclude any mapping whose object is not in ZFA, (3) include only the mappings where many objects are mapped to a same subject, and (4) update the resulting mapping set by filling the object labels with labels from the ZFA ontology:

(The --catalog none option is to prevent sssom-cli from trying to read Uberon’s catalog-v001.xml. That file has a syntax error that SSSOM-Java, which uses a stricter parser than ROBOT or Protégé, refuses to silently ignore.)

#curie_map:
#  ORCID: https://orcid.org/
#  UBERON: http://purl.obolibrary.org/obo/UBERON_
#  ZFA: http://purl.obolibrary.org/obo/ZFA_
#  obo: http://purl.obolibrary.org/obo/
#mapping_set_id: http://purl.obolibrary.org/obo/uberon/core/mappings.sssom.tsv
#creator_id:
#  - ORCID:0000-0002-1373-1705
#  - ORCID:0000-0002-6095-8718
#license: http://creativecommons.org/licenses/by/3.0/
#subject_source: obo:uberon/core.owl
#object_source: obo:zfa.owl
subject_id      subject_label   predicate_id    object_id       object_label    mapping_justification
UBERON:0000165  mouth   semapv:crossSpeciesExactMatch   ZFA:0000547     mouth   semapv:UnspecifiedMatching
UBERON:0000165  mouth   semapv:crossSpeciesExactMatch   ZFA:0000590     oral region     semapv:UnspecifiedMatching
UBERON:0001230  glomerular capsule      semapv:crossSpeciesExactMatch   ZFA:0005254     renal glomerular capsule      semapv:UnspecifiedMatching
UBERON:0001230  glomerular capsule      semapv:crossSpeciesExactMatch   ZFA:0005310     pronephric glomerular capsule semapv:UnspecifiedMatching
UBERON:0001286  Bowman's space  semapv:crossSpeciesExactMatch   ZFA:0005283     renal capsular space    semapv:UnspecifiedMatching
UBERON:0001286  Bowman's space  semapv:crossSpeciesExactMatch   ZFA:0005312     pronephric capsular space    semapv:UnspecifiedMatching
UBERON:0003054  roof plate      semapv:crossSpeciesExactMatch   ZFA:0001436     roof plate neural tube regionsemapv:UnspecifiedMatching
UBERON:0003054  roof plate      semapv:crossSpeciesExactMatch   ZFA:0007058     roof plate      semapv:UnspecifiedMatching
UBERON:2000089  actinotrichium  semapv:crossSpeciesExactMatch   ZFA:0000089     fin fold actinotrichium semapv:UnspecifiedMatching
UBERON:2000089  actinotrichium  semapv:crossSpeciesExactMatch   ZFA:0005435     actinotrichium  semapv:UnspecifiedMatching