monarch-initiative / curate-gpt

LLM-driven curation assist tool (pre-alpha)
https://monarch-initiative.github.io/curate-gpt/
BSD 3-Clause "New" or "Revised" License
49 stars 11 forks source link

Support triple extraction use case #33

Open caufieldjh opened 4 months ago

caufieldjh commented 4 months ago

In discussion with RNA-KG group (Marco Mesiti, Elena Casiraghi, Emanuele Cavalleri) and @justaddcoffee - we would like to be able to extract triples (s, p, o) from a provided text, using graph embeddings to guide the process. The goal is to find additional content for RNA-KG. Using OntoGPT has worked well for this so far but does not take advantage of the existing relations within the KG.

This would involve:

Integrating some process for comparison of the extracted triples would be ideal (e.g., A vs B appears in 20 documents, 15 of them from different sources, etc).

RNA-KG group has also suggested trying an alternative vector DB (https://www.llamaindex.ai/) to see if it works better for RAG with KG data.

cmungall commented 4 months ago

I'm not following the part about KG embeddings. I don't think we'd want a dependency on GRAPE here. But we want to support people providing their own embeddings e.g. via venomx. However I don't get how GRAPE/node2vec style embeddings would work with RAG.

Good suggestion to explore llamaindex. But I think this is orthogonal. See #34

justaddcoffee commented 4 months ago

Not sure what exactly Marco had in mind for using KG embeddings with RAG, but possibly something like read in abstracts that may contain relations of interest, do NER/ground to get IDs/CURIEs of interest from text, then pull these and any related nodes using KG embeddings and send along for context? Not sure

justaddcoffee commented 4 months ago

Also, agree that a GRAPE dependency might not be what we want here. I've made a (draft) PR #36 to support pulling embeddings from huggingface or any other URL