sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
177 stars 68 forks source link

Add MONDO Client #1311

Closed cthoyt closed 3 years ago

cthoyt commented 3 years ago

The Monarch Disease Ontology is the one where most of the interesting efforts is going at the moment - they're more open an receptive to curation and are working hard to align with OBO standards. It will be the most complete disease vocabulary and have the most mappings.

This PR adds a MONDO client using the OBO client as well as fixes a bug in the parsing of alternative identifiers.

Related: https://github.com/indralab/gilda/issues/50

bgyori commented 3 years ago

Thanks @cthoyt! I think mondo.json is missing from version control. Is the convention for MONDO to have "namespace embedded" in IDs? In that case the right API for all the client functions would be to expect the prefix like in your test for get_id_from_alt_id('MONDO:0018220'), otherwise without the prefix would be better. We should also think about adding all the relevant xrefs to the ontology graph, are those available through this OBO?

cthoyt commented 3 years ago

Thanks @cthoyt! I think mondo.json is missing from version control. Is the convention for MONDO to have "namespace embedded" in IDs? In that case the right API for all the client functions would be to expect the prefix like in your test for get_id_from_alt_id('MONDO:0018220'), otherwise without the prefix would be better.

Yes, it's another OBO Foundry ontology so it has the same scheme as GO (for example)

We should also think about adding all the relevant xrefs to the ontology graph, are those available through this OBO?

Yes those are available. I just realized that we hadn't explicitly done that for another OBO or OWL so far

bgyori commented 3 years ago

I see, in that case, the JSON should have the MONDO prefixes embedded under id, relations, etc., see e.g., https://raw.githubusercontent.com/sorgerlab/indra/master/indra/resources/chebi.json. Then the IDs propagate into the rest of the code correctly without further preprocessing needed.

bgyori commented 3 years ago

We got this test failure:

======================================================================
FAIL: Doctest: indra.databases.mondo_client.get_id_from_alt_id
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/doctest.py", line 2199, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for indra.databases.mondo_client.get_id_from_alt_id
  File "/home/runner/work/indra/indra/indra/databases/mondo_client.py", line 43, in get_id_from_alt_id

----------------------------------------------------------------------
File "/home/runner/work/indra/indra/indra/databases/mondo_client.py", line 57, in indra.databases.mondo_client.get_id_from_alt_id
Failed example:
    assert '0024812' == mondo_client.get_id_from_alt_id('0002399')
Exception raised:
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest indra.databases.mondo_client.get_id_from_alt_id[1]>", line 1, in <module>
        assert '0024812' == mondo_client.get_id_from_alt_id('0002399')
    AssertionError

I suspect this could be due to the API assuming the MONDO: prefix in inputs and providing them in outputs?

cthoyt commented 3 years ago

We got this test failure:

======================================================================
FAIL: Doctest: indra.databases.mondo_client.get_id_from_alt_id
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/doctest.py", line 2199, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for indra.databases.mondo_client.get_id_from_alt_id
  File "/home/runner/work/indra/indra/indra/databases/mondo_client.py", line 43, in get_id_from_alt_id

----------------------------------------------------------------------
File "/home/runner/work/indra/indra/indra/databases/mondo_client.py", line 57, in indra.databases.mondo_client.get_id_from_alt_id
Failed example:
    assert '0024812' == mondo_client.get_id_from_alt_id('0002399')
Exception raised:
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest indra.databases.mondo_client.get_id_from_alt_id[1]>", line 1, in <module>
        assert '0024812' == mondo_client.get_id_from_alt_id('0002399')
    AssertionError

I suspect this could be due to the API assuming the MONDO: prefix in inputs and providing them in outputs?

I realized this was a bug in the OBO loader where the remove prefix logic was not applied to alt ids. I think it's all gucci now