sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
173 stars 65 forks source link

Fix handling of xrefs from OBO #1378

Open bgyori opened 2 years ago

bgyori commented 2 years ago

This PR fixes and issue in integrating xrefs from OBO-derived resources and increases the number of cross-references in the bioontology.

bgyori commented 2 years ago

Integrating xrefs between OBO-sourced IDs actually turns out to be problematic for several reasons:

  1. Several xrefs point to obsolete or non-existent entries in other ontologies
  2. Several xrefs are technically valid but simply incorrect, e.g., 'GO:GO:0140446' (fumigermin biosynthetic process) -> 'CHEBI:CHEBI:147341' (fumigermin)
  3. Some OBOs like MONDO put replaced-by relations as xrefs to entries in the ontology itself, these are currently picked up without further qualification as if they were normal xrefs. (example: MONDO:0014857, MONDO:0044630)
  4. There are non-trivial relationships with mappings from e.g., Biomappings that should be reconciled.

1 and 3 are relatively easy to address. I'm worried about 2, one potential solution being to restrict which namespaces we integrate mappings between to exclude e.g., GO-CHEBI.

cthoyt commented 1 year ago

@bgyori can we revisit this? I think it will solve the issue I showed last friday on cogex that the MONDO term for asthma wasn't connected to the rest of the asthma terms with an xref relation

I agree that in your last comment, point 2 might difficult to overcome. Since most relations don't have any semantics ascribed to them except "database cross-reference", there are lots of kinds of things in there, including references for shadow terms. In https://gist.github.com/cthoyt/e13b270060a602830b9eb02c45f6b716, I checked this and found the issue is not widespread. There seem to be 5 between EFO/ChEBI and 3 between GO/ChEBI of this problem. We could potentially make PRs to these ontologies directly to fix, encode some additional logic (a short blacklist), or something else to address this.