monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
609 stars 77 forks source link

How to solve `Cannot find slot for` #277

Closed serenalotreck closed 11 months ago

serenalotreck commented 11 months ago

I've seen the same error message in the tracebacks of some other issues, but it wasn't clear to me exactly what causes the error or how to deal with it. I have a suspicion it's because of how I wrote my custom schema, but I'm not sure what "slot" refers to here.

I'm using a custom schema as follows:

id: http://w3id.org/ontogpt/desiccation
name: desiccation
title: desiccationTemplate
description: >-
  A template for extracting ChEBI, GO, NCBITAXON, PO, TO, PECO
license: https://creativecommons.org/publicdomain/zero/1.0/
prefixes:
  linkml: https://w3id.org/linkml/
  desiccation: http://w3id.org/ontogpt/desiccation

default_prefix: desiccation
default_range: string

imports:
  - linkml:types
  - core

classes:
  EntityContainingDocument:
    tree_root: true
    is_a: NamedEntity
    attributes:
      environmental conditions:
        range: EnvironmentalCondition
        multivalued: true
        description: >- 
          A semicolon-separated list of environmental terms.
      taxa:
        range: Taxon
        multivalued: true
        description: >- 
          A semicolon-separated list of taxonomic terms of living things.
      traits:
        range: Trait
        multivalued: true
        description: >- 
          A semicolon-separated list of plant traits.

  EnvironmentalCondition:
    is_a: NamedEntity
    id_prefixes:
      - PECO
    annotations:
      annotators: sqlite:obo:peco
      prompt: >- 
        the name of an environmental treatment.
         Examples are drought, salt stress, cold tolerance.

  Taxon:
    is_a: NamedEntity
    id_prefixes:
      - NCBITaxon
    annotations:
      annotators: sqlite:obo:ncbitaxon
      prompt: >- 
        the name of a taxonomic name or species.
         Examples are Bacillus subtilus, Bos taurus, blue whale.

  Trait:
    is_a: NamedEntity
    id_prefixes:
      - TO
    annotations:
      annotators: sqlite:obo:to
      prompt: >- 
        the description of a plant trait.
         Examples of trait categories are germination ratio, fruit hollowness, arid region exposure.

This schema is a WIP, I'd eventually like to add more classes and some relations as well, but wanted to get a simple version debugged first to make sure I understand the mechanic before spending time writing more classes.

Basically every document I've run with this schema returns something like:

ERROR:root:Cannot find slot for environmental_condition in environmental conditions: aqueous systems

It seems like it's only the environmental condition that's the problem, but I can't figure out what's different from what I wrote for EnvironmentalCondition versus the other classes; any pointers are appreciated!

serenalotreck commented 11 months ago

I think this was a simple fix, I got rud of the space in environmental conditions and changed to environmental_conditions, I'm now getting an error that looks like it's related to GPT (Sorry, but I'm unable to assist.), so I'll close this as resolved.

caufieldjh commented 11 months ago

Hi @serenalotreck - that error reveals a disconnect between what's in the raw LLM output and what's in the schema. This usually happens if the LLM gets unexpectedly creative with formatting, but in this case, it's because the schema defines the slot as environmental conditions but it gets parsed as environmental_conditions, with an underscore. Change the slot name to environmental_conditions in the schema, run make to rebuild, and the error should not appear on the next run.

(I see you just solved this as well)

That being said, I tried this schema with a few different inputs and couldn't get aqueous systems to ground to a PECO term - this is kind of a challenging ontology anyway since so many of its term labels end with "exposure". Looking into some potential workarounds applicable to this and other ontologies with common label formats.

serenalotreck commented 11 months ago

@caufieldjh jinx, thank you!

Now that I've fixed that I have this "Sorry" error, not sure if it's related to this issue or if it's something different? I checked my OpenAI account and I have plenty of usage left, so it's not that I've reached my limit:

ERROR:root:Line 'Sorry, but I'm unable to assist.' does not contain a colon; ignoring
ERROR:root:Cannot ground None annotation, cls=EntityContainingDocument

EDIT: Meant to say that, I'm seeing this error and getting no output as a result, so I haven't seen the result of not being able to ground aqueous systems. Will it just discard annotations that can't be grounded, or does it include them without an ontology identifier?

caufieldjh commented 11 months ago

Hi @serenalotreck - just catching up with this one. What you're seeing is a case in which the LLM has completely refused to perform the requested operation. The easiest workaround is to try it again, though you may have to remove or delete the .openai_cache.db from the root directory of OntoGPT or the query will just use cached data rather than running a fresh request. Under normal circumstances OntoGPT will retain annotations that can't be grounded, but it will raise an error like this if it gets no usable response. If this kind of error keeps happening, feel free to open up a new issue and we'll troubleshoot the schema you're using.