(related to discussion with @enockniyonkuru re: MAXO extraction on Apr 1 2024)
SPIRES extraction does grounding recursively, but still doesn't always catch instances where the term to match is within a longer string, e.g.
lorem ipsum dolor TERM TO MATCH sit amet
Sometimes this is due to closely related texts not otherwise defined as synonyms, like vitamin supplementation (MAXO:0001129) vs vitamin therapy.
An additional refiner pass could assist with grounding by doing one or more of the above:
Performing an additional round of recursive searching, particularly in cases where the extracted string is longer than expected
Doing class-agnostic chunking with the LLM - essentially asking it to try again, but make it more/less specific.
Ditto, but with more traditional NLP methods, even just further tokenization
For ungrounded terms, replace common prefixes/suffixes with those more common in the source annotator. This may be better as a RAG approach but could also work in-context (i.e., instructions passed directly to the LLM) or as a post-processing step capable of recognizing repeated patterns among phrases. Sounds a bit like a curategpt thing though.
(related to discussion with @enockniyonkuru re: MAXO extraction on Apr 1 2024)
SPIRES extraction does grounding recursively, but still doesn't always catch instances where the term to match is within a longer string, e.g.
lorem ipsum dolor TERM TO MATCH sit amet
Sometimes this is due to closely related texts not otherwise defined as synonyms, likevitamin supplementation
(MAXO:0001129) vsvitamin therapy
.An additional refiner pass could assist with grounding by doing one or more of the above: