monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
548 stars 68 forks source link

Provide equivalencies for phenotype terms #334

Open caufieldjh opened 4 months ago

caufieldjh commented 4 months ago

From @matentzn at today's Monarch Huddle:

Wishlist to OntoGPT team: we need a scalable solution to suggestion EQs for phenotype ontologies.

matentzn commented 4 months ago

Pattern library: https://github.com/obophenotype/upheno/tree/master/src/patterns/dosdp-patterns

Some matches: https://github.com/obophenotype/upheno-dev/tree/master/src/curation/pattern-matches

which contains

https://github.com/obophenotype/upheno-dev/blob/master/src/curation/pattern-matches/hp/abnormalMorphologyOfAnatomicalEntity.tsv

The goal would be to:

  1. Use Phenio KG, patterns and known matches (potentially) as background knowledge
  2. generate tables like this https://github.com/obophenotype/upheno-dev/blob/master/src/curation/pattern-matches/hp/abnormalMorphologyOfAnatomicalEntity.tsv for all phenotypes that dont yet have one

something like that, didnt think this through

caufieldjh commented 4 months ago

Does this idea also include leveraging the LLM's ability to simulate inference of anatomical relationships, like "zebrafish don't have a jugular vein" - though that's also likely to be hallucination-prone

matentzn commented 4 months ago

I would hope so! But I didnt think that far. SInce the input are the existing phenotype terms, we dont need to worry that we feed terms that do not exist.

cmungall commented 4 months ago

Did you see the DRAGON-AI results for this? RAG works well with ontologies especially those amenable to patternization

On Tue, Feb 13, 2024 at 8:27 AM Harry Caufield @.***> wrote:

From @matentzn https://github.com/matentzn at today's Monarch Huddle:

Wishlist to OntoGPT team: we need a scalable solution to suggestion EQs for phenotype ontologies.

— Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/ontogpt/issues/334, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOMWFPTZXU6FYBDPXZLYTOIAPAVCNFSM6AAAAABDGZ4U6OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTENZRHA2TANY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

matentzn commented 4 months ago

No I did not! I put it on our Agenda.

cmungall commented 4 months ago

Table 4:

image

Unfortunately the dumb ontology importer I wrote for curategpt doesn't support subq (have I mentioned how much I hate subq?) so mp/hp are exempt from this analysis, but OBA gives you an example at the extreme end of postcomp.

One major limitation of the analysis is that we limited things to new terms to avoid test data leakage. But for some ontologies like go only a handful of new terms had ldefs, so the sample size is very small here and likely biased towards "easy" cases hence the somewhat contradictory better performance on GO than on a patternized ontology