monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
609 stars 77 forks source link

Using OntoGPT outside BioInformatics #471

Open jamesfebin opened 1 week ago

jamesfebin commented 1 week ago

I am trying to use OntoGPT in a domain outside of bioinformatics. Presently trying something simple like extracting names of people from a given text.

I have a dumb question.

The values are pre-defined in most of the templates I have seen (Ex: vbo_names). So, when I try to modify and use the template, though it's a valid LinkML file, OntoGPT doesn't add them to OWL like only the last value in a list of people's names is added. And it gives errors like

INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Febin John James')]

Custom Template I made.

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string

classes:
  Person:
    attributes:
      full_name:
  Container:
    tree_root: true
    attributes:
      persons:
        multivalued: true
        inlined_as_list: true
        range: Person

Is there a template that's a bit generic I can use in this case?

caufieldjh commented 1 week ago

Hi @jamesfebin - OntoGPT can do this, and your template is a great start - it just needs some more details for the LLM to work with.

(The imports should also include core as this defines the main OntoGPT types)

So if the input text is something like this:

In a surprise move, the city council of Oakdale voted to approve a new development project led by prominent businesswoman, Emily-Jane Lee. The project, which will bring a new shopping center and several restaurants to the downtown area, has been met with both excitement and skepticism from local residents. Council members, including Chairperson Maria Rodriguez, Vice Chair John Michael Davis Jr., and Councilor Sofia Patel, cited the potential economic benefits and job creation as key factors in their decision. However, some residents, such as longtime Oakdale resident and activist, Ava Morales, have expressed concerns about the impact on traffic and local small businesses. Despite these concerns, project investor, Julian Styles, remains confident that the development will be a success and a boon to the community.

Then a template like this should work:

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
  - core
default_range: string

classes:

  Container:
    tree_root: true
    attributes:
      persons:
        description: >-
          A semicolon-delimited list of people named in the text.
        multivalued: true
        inlined_as_list: true
        range: Person

  Person:
    description: >-
      A person.
    attributes:
      full_name:
        description: >-
          The full name of the person.
        range: string

Run something like ontogpt extract -t personinfo.yaml -i input.txt and you should get a result like:

---
input_text: In a surprise move, the city council of Oakdale voted to approve a new
  development project led by prominent businesswoman, Emily-Jane Lee. The project,
  which will bring a new shopping center and several restaurants to the downtown area,
  has been met with both excitement and skepticism from local residents. Council members,
  including Chairperson Maria Rodriguez, Vice Chair John Michael Davis Jr., and Councilor
  Sofia Patel, cited the potential economic benefits and job creation as key factors
  in their decision. However, some residents, such as longtime Oakdale resident and
  activist, Ava Morales, have expressed concerns about the impact on traffic and local
  small businesses. Despite these concerns, project investor, Julian Styles, remains
  confident that the development will be a success and a boon to the community.
raw_completion_output: 'persons: Emily-Jane Lee; Maria Rodriguez; John Michael Davis
  Jr.; Sofia Patel; Ava Morales; Julian Styles;'
prompt: |+
  Split the following piece of text into fields in the following format:

  full_name: <The full name of the person.>

  Text:
  Julian Styles

  ===

extracted_object:
  persons:
    - full_name: Emily-Jane Lee
    - full_name: Maria Rodriguez
    - full_name: John Michael Davis Jr.
    - full_name: Sofia Patel
    - full_name: Ava Morales
    - full_name: Julian Styles
jamesfebin commented 1 week ago

Thank you, @caufieldjh I am able to generate the yaml file.

However, I get the following when I use it for OWL format. And it doesn't generate a valid .owl file.

INFO:root:Output format: owl
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: Container == Person owning: Container
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: Container == Person owning: Container
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithEntity == Publication owning: TextWithEntity
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithEntity == Publication owning: TextWithEntity
INFO:root:Subject=None
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Emily-Jane Lee')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Maria Rodriguez')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='John Michael Davis Jr.')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Sofia Patel')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Ava Morales')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Julian Styles')]
INFO:root:Cannot determine axiom type for persons, unprocessed=[]
caufieldjh commented 1 week ago

Generating OWL requires a few more format-specific details so the OWL interpreter knows how to define relationships the LinkML format doesn't identify. Try this:

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
  personinfo: https://w3id.org/linkml/examples/personinfo/
imports:
  - linkml:types
  - core
default_range: string

default_prefix: personinfo

classes:

  Container:
    tree_root: true
    attributes:
      persons:
        description: >-
          A semicolon-delimited list of people named in the text.
        multivalued: true
        inlined_as_list: true
        annotations:
          owl: ObjectProperty, ObjectSomeValuesFrom
        range: Person

  Person:
    is_a: NamedEntity
    description: >-
      A person.
    attributes:
      full_name:
        description: >-
          The full name of the person.
        range: string
      id:
        description: >-
          A unique identifier for the person.
          This is their full name without spaces
          or special characters.
        identifier: true
        range: string

That should generate OWL like this:

Prefix( owl: = <http://www.w3.org/2002/07/owl#> )
Prefix( rdf: = <http://www.w3.org/1999/02/22-rdf-syntax-ns#> )
Prefix( rdfs: = <http://www.w3.org/2000/01/rdf-schema#> )
Prefix( xsd: = <http://www.w3.org/2001/XMLSchema#> )
Prefix( xml: = <http://www.w3.org/XML/1998/namespace> )
Prefix( linkml: = <https://w3id.org/linkml/> )
Prefix( personinfo: = <https://w3id.org/linkml/examples/personinfo/> )
Prefix( shex: = <http://www.w3.org/ns/shex#> )
Prefix( schema: = <http://schema.org/> )
Prefix( NCIT: = <http://purl.obolibrary.org/obo/NCIT_> )
Prefix( RO: = <http://purl.obolibrary.org/obo/RO_> )
Prefix( biolink: = <https://w3id.org/biolink/vocab/> )
Prefix( core: = <http://w3id.org/ontogpt/core/> )

Ontology( <https://w3id.org/linkml/examples/personinfo>
    AnnotationAssertion( rdfs:label personinfo:EmilyJaneLee "Emily-Jane Lee" )
    AnnotationAssertion( rdfs:label personinfo:MariaRodriguez "Maria Rodriguez" )
    AnnotationAssertion( rdfs:label personinfo:JohnMichaelDavisJr "John Michael Davis Jr" )
    AnnotationAssertion( rdfs:label personinfo:SofiaPatel "Sofia Patel" )
    AnnotationAssertion( rdfs:label personinfo:AvaMorales "Ava Morales" )
    AnnotationAssertion( rdfs:label personinfo:JulianStyles "Julian Styles" )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:EmilyJaneLee ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:MariaRodriguez ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:JohnMichaelDavisJr ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:SofiaPatel ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:AvaMorales ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:JulianStyles ) )
)
jamesfebin commented 1 week ago

Thank you again, @caufieldjh.

However, when I import this on Protege or another owl visualizer, I get an error. Can you point me to any document or resources so I can study and solve these issues myself? (How to go about writing yaml file to generate owl data models)

Screenshot 2024-11-08 at 9 51 21 PM
caufieldjh commented 1 week ago

Hi @jamesfebin, OntoGPT uses LinkML tools for generating OWL (and other serializations) so you may find these docs helpful: https://linkml.io/linkml/generators/owl.html

cmungall commented 6 days ago

Remember generators are for schema conversion. Ontogpt uses linkml-owl for data conversion

On Sun, Nov 10, 2024 at 1:33 PM Harry Caufield @.***> wrote:

Hi @jamesfebin https://github.com/jamesfebin, OntoGPT uses LinkML tools for generating OWL (and other serializations) so you may find these docs helpful: https://linkml.io/linkml/generators/owl.html

— Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/ontogpt/issues/471#issuecomment-2466935956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOLSC4B5ZTZIHJHLGCLZ77GKBAVCNFSM6AAAAABRM644AKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWHEZTKOJVGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>