monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
574 stars 72 forks source link

Using value set expansion with non-OBO annotator leads to 404 error #347

Open caufieldjh opened 5 months ago

caufieldjh commented 5 months ago

Originally seen in #346. Using a Bioportal ontology (like bioportal:SNOMEDCT) as an annotator works fine on its own. If used in a value set expansion, e.g., with a reachable_from, it tries to download the corresponding semsql db. This doesn't work if the semsql db does not exist. It should throw a warning instead of trying to access a nonexistent resource.

This is the offending line: https://github.com/monarch-initiative/ontogpt/blob/bba969258dbe30436a5df880f5a016f4409d89ee/src/ontogpt/engines/knowledge_engine.py#L396

Example as per #346, from debugger:

INFO:ontogpt.engines.knowledge_engine:Grounding Elevated liver function tests. to SNOMEDCT:75540009; next step is to normalize
> /home/harry/ontogpt/src/ontogpt/engines/knowledge_engine.py(396)is_valid_identifier()
-> range_enum = sv.get_enum(e)
(Pdb) n
> /home/harry/ontogpt/src/ontogpt/engines/knowledge_engine.py(397)is_valid_identifier()
-> pvs = vse.expand_value_set(range_enum, sv.schema)
(Pdb) n
> /home/harry/ontogpt/src/ontogpt/engines/knowledge_engine.py(398)is_valid_identifier()
-> valid_ids = [pv.text for pv in pvs]
(Pdb) n
INFO:root:Locator: obo:SNOMEDCT
INFO:root:Ensuring gunzipped for https://s3.amazonaws.com/bbop-sqlite/SNOMEDCT.db.gz
INFO:pystow.utils:downloading with urllib from https://s3.amazonaws.com/bbop-sqlite/SNOMEDCT.db.gz to /home/harry/.data/oaklib/SNOMEDCT.db.gz
urllib.error.HTTPError: HTTP Error 404: Not Found

The issue here arises because oaklib's ValueSetExpander requires a specific config to work with Bioportal. Otherwise it assumes it's in OBO space. See https://github.com/INCATools/ontology-access-kit/blob/a51929160549b160831474472f93ebc9a5a22c01/src/oaklib/utilities/subsets/value_set_expander.py#L263-L295

caufieldjh commented 5 months ago

Partially related, but not sure what's going on with this line: https://github.com/monarch-initiative/ontogpt/blob/bba969258dbe30436a5df880f5a016f4409d89ee/src/ontogpt/engines/knowledge_engine.py#L285

caufieldjh commented 5 months ago

See also: https://github.com/INCATools/ontology-access-kit/blob/a51929160549b160831474472f93ebc9a5a22c01/src/oaklib/utilities/subsets/value_set_expander.py#L28

Can solve here by having OntoGPT create its own ValueSetExpander with its own config (not just the default oaklib one)