ndexbio / llm-text-to-knowledge-graph

A package and command-line interface for extracting knowledge graphs from text, with a special focus on supporting agents working with scientific hypotheses and experiment design
MIT License
0 stars 0 forks source link

Test extraction with the BEL representation scheme #8

Open dexterpratt opened 1 week ago

dexterpratt commented 1 week ago

Try a version of the prompt with the BEL vocabulary defined instead of the INDRA vocabulary

BEL Documentation

the Cheatsheet might be a good definition to put in the prompt

We want to use the HGNC grounding for the BEL statements

dexterpratt commented 1 day ago

To test BEL generation from text, use the BEL small corpus as a source of sentences and associated BEL statements.

https://github.com/cthoyt/selventa-knowledge/blob/master/selventa_knowledge/small_corpus.bel

Use selections from the corpus as examples to provide in the LLM prompt.

Test by processing the entire small corpus. For each evidence text, generate bel statements in the same format as the curated statements. The bel expressions can be directly compared - the order of operators is defined in BEL such that the expression for a given identity is unambiguous. However, in next steps, we can see what might be useful to do to make debugging and scoring easier, such as identify near-matches and diagnose what is different, i.e. subject and interaction match but the object does not.

We do not need to put the BEL into the json form we have been using for the purposes of this test, it will be easier for us to visually compare the BEL expressions when stated in their standard form. When using this to perform knowledge graph extractions, then it will be appropriate to use the current json form.