hubmapconsortium / ontology-api

The HuBMAP Ontology Service
MIT License
4 stars 3 forks source link

Ontology: Remove dependency on OWLNETS_relations.txt #161

Closed AlanSimmons closed 1 year ago

AlanSimmons commented 1 year ago

Issue

The OWLNETS-UMLS-GRAPH script assumes the presence of 3 files that are the output of the PheKnowLator-based OWL-OWLNETS converter:

We recently started the Data Distillery (DD) project. We agreed to use a single code base for the ontology generation framework--i.e., ontology graphs for DD, HuBMAP, SenNet, etc. would be generated identically.

For DD, we specified only two files:

The assumption was that the information in OWLNETS_relations.txt was redundant, and could be derived from the predicate field of OWLNETS_edges.txt.

Solution

It will be necessary to modify the OWLNETS-UMLS-GRAPH script so that it does not depend on the relations file to obtain relationship information.

Unintended consequences that need to be addressed

  1. Data in the relations file is not redundant for the case in which the relationship is defined with an IRI from an ontology other than the Relationship Ontology. The script currently obtains all relationship information from RO, including names of relationships; thus, it is not possible to obtain a relationship label for a relationship identified with an IRI not in RO except by means of the relationship file. As this is a common case for ontologies processed by PheKnowLator, we must allow the script to continue to use the relations text if it is available.
  2. If the relations file is not present, the script's dependency on the Relations Ontology source (ro.json) is greater, especially with respect to inverse relationships.
AlanSimmons commented 1 year ago

I modified the script. This entailed a couple of minor related changes to the generation scripts for HUBMAP and UNIPROTKB. Unit testing.

AlanSimmons commented 1 year ago

Improved accuracy of relationship data

Increasing the dependency on the current source of information for RO (ro.json) required the addressing of two types of flaws in ro.json:

Some relations do not have inverses. The script creates a "pseudo-inverse" in the form of a relationship with prefix "inverse_". (The earlier script also did this, but missed some relationships.) Some relations had incomplete information regarding their inverses. For example, RO_0002206 (expressed in) is listed as the inverse of RO_0002292 (expresses), but RO_0002292 is not listed as the corresponding inverse of RO_0002206. The script can now identify the appropriate inverse relationship instead of just creating a pseudo-inverse.

AlanSimmons commented 1 year ago

Results of Regression testing here.

The new relations code resulted in improvements in the identification of relationships in the resulting knowledge graph.