information-artifact-ontology / ontology-metadata

OBO Metadata Ontology
Creative Commons Zero v1.0 Universal
19 stars 8 forks source link

Variants, Serialisations, PURLs: Solving the madness once and for all #171

Open matentzn opened 2 months ago

matentzn commented 2 months ago

We have driven us into a complicated space in terms of ontology release space. Please read this reference before contributing to this issue.

These are the basic variations:

  1. Serialisation formats (JSON, OWL, OFN, OBO Format, etc) - the same ontology can be serialised in different formats
  2. Variants (simple, basic, full, base) - the same ontology can be serialised in different shapes, e.g. with or without imports, with or without relationships to other ontologies (or a subset of these), with or without owl:imports statement, and possibly more.
  3. Subsets - the same ontology can result in any number of subsets used for specific use cases, including project specific subsets, taxon-specific subsets and branch/use case-specific subsets

The combinations (serialisation x subsets, serialisation x variant) are many.

PURLs have many different uses

  1. Identify a specific TERM in an ontology (e.g. http://purl.obolibrary.org/obo/UBERON_1234567)
  2. Identify a specific serialised main release of an ontology (e.g. http://purl.obolibrary.org/obo/uberon.owl, the JSON serialisation of the main release file)
  3. Identify a specific serialised distribution of a variant of an ontology (e.g. http://purl.obolibrary.org/obo/uberon/uberon-base.json, JSON serialisation of the "base" variant).
  4. Identify a specific serialised subset of an ontolgy (e.g. http://purl.obolibrary.org/obo/uberon/subsets/uberon-human.json, JSON serialisation of the "human" subset of uberon). Both this, (2) and (3) above in some way describe a series of versions, where the PURL usually resolves to the latest version .
  5. Identify a specific version of a serialised distribution of an ontology (e.g. http://purl.obolibrary.org/obo/mondo/releases/2022-06-11/mondo.owl, the OWL serialisation of the main release published on the 2022-06-11). This works for subsets, variants and main releases the same.
  6. Identify a specific ontology as a whole (more later)

Here is some more context for the avid reader: https://github.com/INCATools/ontology-development-kit/issues/1037.

This is a difficult mess, and if @cmungall would have the last say, he would scrap it all in favour of just base PURLs, but IMO, the system is mostly clearly defined, and we have build infrastructure around it that depend on it.

However, we are missing at least (1) very important thing: a standard way to refer to an ontology as a whole.

We hereby suggest a new property in OMO:

has ontology id: and expect this to be the official "ontology purl" (note, not "distribution" PURL as all the examples above are). The value of this would be, for OBO at least http://purl.obolibrary.org/obo/uberon, e.g. not containing variant, subset, or serialisation information.

In conjunction with the already existing owl:versionInfo, we would not only introduce this property here in OMO, but also lobby for making it required, as a second step, for all OBO ontologies (same as license, title and description).

Furthermore, we expect the value of rdfs:isDefinedBy to always correspond to that ontology purl.

Lastly, we could introduce a simple versioned variant:

http://purl.obolibrary.org/obo/uberon/2024-04-08 to refer to the ontology version as a whole, but we can discuss this separately.

matentzn commented 2 months ago

For those of you wondering why the title of this issue is not: New term request: has ontology id:

I do, implicitly, expect that with the introduction of this property, the role of the ontologyIRI and versionIRI PURLs described in the issue are implictly approved as well.