monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
548 stars 68 forks source link

Conversion of legal ontologies to LinkML schemas #243

Open caufieldjh opened 8 months ago

caufieldjh commented 8 months ago

Hi everyone, thanks for all the effort you're putting in this project! I'm working on the extraction of events and arguments in a different domain, the legal and juridical one. In this domain resources are really scarse, for example don't exist any predefined event schema. I only found a few ontologies which could be useful, like this one: https://github.com/PLN-FaMAF/legal-ontology-population or this one: https://github.com/essepuntato/allot Do you think it's also possible to automatic convert this kind of ontologies to LinkML Schemas? It could be really nice to widen the range of possibilities that this project can handle. Thank you very much, Mathias.

Originally posted by @TribeDH in https://github.com/monarch-initiative/ontogpt/issues/237#issuecomment-1772937030

caufieldjh commented 8 months ago

The ALLOT ontology appears to convert well with robot and schema-automator:

$ wget https://raw.githubusercontent.com/essepuntato/allot/master/ontology/current/allot.owl
$ robot convert -i allot.owl -o allot.ofn
$ schemauto import-owl allot.ofn

It's not terribly large, so I'll just post the result here:

name: allot
description: allot
id: https://w3id.org/akn/ontology/allot
imports:
- linkml:types
prefixes:
  linkml: https://w3id.org/linkml/
  allot: https://w3id.org/None/
default_prefix: allot
slots:
  hasEmbodiment:
    comments:
    - This property allows one to specify the manifestation in which an expression
      is embodied.@en
    is_a: hasRelatedReference
    slot_uri: allot:hasEmbodiment
    multivalued: true
    range: FRBRManifestation
  hasExemplar:
    comments:
    - This property allows one to specify an exemplification of a manifestation.@en
    is_a: hasRelatedReference
    slot_uri: allot:hasExemplar
    multivalued: true
    range: FRBRItem
  hasRealization:
    comments:
    - This property allows one to specify the entities that realizes a work.@en
    is_a: hasRelatedReference
    slot_uri: allot:hasRealization
    multivalued: true
    range: FRBRExpression
  hasRelatedReference:
    comments:
    - This property allows one to link two instances of the class Reference that are
      related for some reason.@en
    slot_uri: allot:hasRelatedReference
    multivalued: true
    range: Reference
  mentions:
    comments:
    - This property allows to specify the references that are mentioned by a document,
      independently from the particular FRBR level one is considering.@en
    is_a: hasRelatedReference
    slot_uri: allot:mentions
    multivalued: true
  creator:
    slot_uri: dc:creator
    multivalued: true
  date:
    slot_uri: dc:date
    multivalued: true
  description:
    slot_uri: dc:description
    multivalued: true
  license:
    slot_uri: dc:license
    multivalued: true
  title:
    slot_uri: dc:title
    multivalued: true
  mbox:
    slot_uri: foaf:mbox
    multivalued: true
classes:
  Concept:
    comments:
    - "Any non-tangible notion or idea that can be referred to but does not fit the\
      \ other top level classes. \n\nE.g.: the approval of an act, peace, child, Mickey\
      \ Mouse, John Constantine, a class of an ontology.@en"
    is_a: Reference
    mixins:
    - SocialObject
    - BFO_0000002
    class_uri: allot:Concept
  Event:
    comments:
    - "Something that happened, will happen, may happen or have lasted. \n\nE.g.:\
      \ the World War II, the coming into force of act 27, Sunday 26th of August 2012.@en"
    is_a: Reference
    mixins:
    - BFO_0000003
    - Event
    class_uri: allot:Event
  FRBRExpression:
    comments:
    - "Any version of a FRBR Work whose content is specified and different from others\
      \ for any reason: language, versions, etc. \n\nE.g.: act 3 of 2005 as in the\
      \ version following the amendments entered into force on July 3rd, 2006.@en"
    is_a: Concept
    mixins:
    - InformationObject
    slots:
    - hasEmbodiment
    - mentions
    class_uri: allot:FRBRExpression
  FRBRItem:
    comments:
    - "The physical copy of any manifestation, such as a sheet of paper on a desk\
      \ or file stored somewhere in some computer on the net or disconnected. \n\n\
      E.g.: the file called act32005.pdf on my computer containing a PDF representation\
      \ of act 3, 2005.@en"
    is_a: InformationRealization
    mixins:
    - Object
    slots:
    - mentions
    class_uri: allot:FRBRItem
  FRBRManifestation:
    comments:
    - "Any electronic or physical format of a FRBR Expression: OOXML, ODT, XML, TIFF,\
      \ PDF, print, etc. \n\nE.g.: PDF representation of act 3 of 2005 as in the version\
      \ following the amendments entered into force on July 3rd, 2006.@en"
    is_a: BFO_0000019
    mixins:
    - Quality
    - Reference
    slots:
    - hasExemplar
    - mentions
    class_uri: allot:FRBRManifestation
  FRBRWork:
    comments:
    - "The abstract concept of a legal resource. \n\nE.g.: act 3 of 2005.@en"
    is_a: Concept
    slots:
    - hasRealization
    - mentions
    class_uri: allot:FRBRWork
  Location:
    comments:
    - "A location that can be referred to also using geographical coordinates. \n\n\
      E.g.: the Rio river, Marrakesh, the entrance to the Black Forrest.@en"
    is_a: Place
    mixins:
    - Concept
    - BFO_0000141
    class_uri: allot:Location
  Object:
    comments:
    - 'Anything concrete (i.e. made of atoms) that can be referred to but that does
      not fit the other top level classes.

      E.g.: a pen, a pet, a building.@en'
    is_a: Reference
    mixins:
    - PhysicalObject
    - BFO_0000040
    class_uri: allot:Object
  Organization:
    comments:
    - "An institution or recognizable group of individuals. Organizations can be formal\
      \ or informal, have a strong degree of internal organization or be completely\
      \ anarchic, have their own name or be anonymous, have their own legal status\
      \ or be impromptu groups \n\nE.g.: the workers\u2019 union, France, the Socialist\
      \ party, the proponents of bill 103/32, the President of the Italian Republic.@en"
    is_a: Concept
    mixins:
    - Organization
    - BFO_0000027
    class_uri: allot:Organization
  Person:
    comments:
    - 'A real human being, regardless whether he/she is alive or deceased, named or
      unnamed. For fictional person, see the class concept.

      E.g.: John Doe, the person with ID RSSMRA72H12L116B.@en'
    is_a: Object
    mixins:
    - NaturalPerson
    class_uri: allot:Person
  Process:
    comments:
    - "A series of actions or steps directed to some end. \n\nE.g.: the approval of\
      \ act 317, the election of the 11th president of the senate.@en"
    is_a: Process
    mixins:
    - BFO_0000015
    - Event
    class_uri: allot:Process
  Reference:
    comments:
    - A reference for any kind of entity.@en
    is_a: BFO_0000001
    mixins:
    - Entity
    - Thing
    slots:
    - hasRelatedReference
    class_uri: allot:Reference
  Role:
    comments:
    - "A part played by a person or an organization, in a certain situation. \n\n\
      E.g.: member of parliament, speaker, head of office, bill proposer.@en"
    is_a: BFO_0000023
    mixins:
    - Concept
    - Role
    class_uri: allot:Role
  Term:
    comments:
    - "A word or group of words whose meaning is defined in a formal and precise manner\
      \ by means of a specific concept. \n\nE.g.: opening sentence, rebuttal, impeachment.@en"
    is_a: Concept
    mixins:
    - InformationObject
    class_uri: allot:Term

That isn't quite ready to be an OntoGPT schema yet (for one, classes like FRBRItem may need renaming to something easier for the LLM to understand, or whatever they're already named in the Functional Requirements for Bibliographic Records) but it's a good start!

TribeDH commented 8 months ago

Thanks again for your commitment to help us. After adding some annotations and prompts for every class, I tried to use the raw schema you provided to see how I could improve it, but then appears an error that I couldn't find anywhere in "Issues" section. The error is something like this:

ValueError: could not find suitable element in [ClassDefinition(name='Concept', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None, implements=[], instantiates=[], extensions={}, annotations={}, description=None, alt_descriptions={}, title=None, deprecated=None, todos=[], notes=[], comments=['Any non-tangible notion or idea that can be referred to but does not fit the other top level classes. \n\nE.g.: the approval of an act, peace, child, Mickey Mouse, John Constantine, a class of an ontology.@en'], examples=[], in_subset=[], from_schema='https://w3id.org/akn/ontology/allot', imported_from=None, source=None, in_language=None, see_also=[], deprecated_element_has_exact_replacement=None, deprecated_element_has_possible_replacement=None, aliases=[], structured_aliases={}, mappings=[], exact_mappings=[], close_mappings=[], related_mappings=[], narrow_mappings=[], broad_mappings=[], created_by=None, contributors=[], created_on=None, last_updated_on=None, modified_by=None, status=None, rank=None, categories=[], keywords=[], is_a='Reference', abstract=None, mixin=None, mixins=['SocialObject', 'BFO_0000002'], apply_to=[], values_from=[], string_serialization=None, slots=[], slot_usage={}, attributes={}, class_uri='allot:Concept', subclass_of=None, union_of=[], defining_slots=[], tree_root=None, unique_keys={}, rules=[], classification_rules=[], slot_names_unique=None, represents_relationship=None, disjoint_with=[], children_are_mutually_disjoint=None, any_of=[], exactly_one_of=[], none_of=[], all_of=[], slot_conditions={}), ...

and then the error continues repeating the same text for every class in the schema. It's my first time approaching the use of schema (my research has focused on other text mining domains) so probably this is a narrow issue, but I really couldn't solve it by myself. Do you have any hint for me?

cmungall commented 8 months ago

I believe schema-automator doesn't follow imports, so the BFO classes are left as dangling (we should make the error reporting less opaque however)

Asa. workaround you can use robot to merge all imports first.

However, I think you will have more luck simply manually editing a schema or using the above as a seed to copy and paste from. It only makes sense to convert from OWL if the OWL was used to encode a schema in the first place, and BFO isn't a schema. Allot itself seems a bit of an odd hybrid with a schema style ontology with BFO layered on top. With my ontology hat on I'd be happy to give lots of feedback on allot but I think for our purposes here it's best to start by thinking what kind of structure you want to extract from ontogpt and work backwards from there.

TribeDH commented 7 months ago

Hi everyone, I apologize for being silent for such a long time but haven't been able to work for a while, and won't be able to do that until mid december.

I just want to share with you that I was finally able to create a backbone legal schema which works with decent results. Of course it will need lot of improvements but it's a good start. This is how it looks like:

name: allot
description: allot
id: https://w3id.org/akn/ontology/allot
imports:
  - linkml:types
  - core

prefixes:
  linkml: https://w3id.org/linkml/
  allot: https://w3id.org/akn/ontology/allot
  BFO: http://purl.obolibrary.org/obo/BFO_

default_prefix: allot
default_range: string
classes:
  LegalEntities:
    tree_root: true
    attributes:
      concepts:
        description: non-tangible notion or idea that can be referred to but does not fit the other top level classes
        annotations:
          prompt: semicolon-separated list of any non-tangible notion or idea that can be referred to but does not fit the other top level classes.
        range: Concept

      events:
        description: Something that happened, will happen, may happen or have lasted.
        annotations:
          prompt: semicolon-separated list of something that happened, will happen, may happen or have lasted.
        range: Event

      people:
        description: A real human being, regardless whether he/she is alive or deceased, named or unnamed
        annotations:
          prompt: semicolon-separated list of real human beings names, regardless alive or deceased
        range: Person
        multivalued: true

      references:
        description: A reference for any kind of entity.
        annotations:
          prompt: semicolon-separated list of references for any kind of entity.
        range: Reference
        multivalued: true

      legalroles:
        description: A part played by a person or an organization, in a legal context.
        annotations:
          prompt: semicolon-separated list of legal roles played by a person or an organization in a legal context situation
        range: LegalRole
        multivalued: true

  Concept:
    is_a: NamedEntity
    id_prefixes:
      - BFO
    annotations:
      annotators: sqlite:obo:bfo
    class_uri: allot:Concept

  Event:
    is_a: NamedEntity
    id_prefixes:
      - BFO
    annotations:
      annotators: sqlite:obo:bfo
    class_uri: allot:Event

  Person:
    is_a: NamedEntity
    id_prefixes:
      - BFO
    annotations:
      annotators: sqlite:obo:bfo
    class_uri: allot:Person

  Reference:
    is_a: NamedEntity
    id_prefixes:
      - BFO
    annotations:
      annotators: sqlite:obo:bfo
    comments:
    - A reference for any kind of entity

  LegalRole:
    is_a: Concept
    id_prefixes:
      - BFO
    annotations:
      annotators: sqlite:obo:bfo
    class_uri: allot:Role

As soon as I'll be able to get back to my work I'll share further enhancements, and I'm also open to suggestions and advices!

caufieldjh commented 7 months ago

Hi @TribeDH, that looks like fantastic progress! For some classes I suspect you could just omit the id_prefixes since the extracted entities are unlikely to ground to a BFO term, and there may be a way to get more fine-grained with event types using enums. Please feel free to add your template to the project with a PR if you'd like!