information-artifact-ontology / ontology-metadata

OBO Metadata Ontology
Creative Commons Zero v1.0 Universal
19 stars 8 forks source link

NTR: "dbxref prefix map" #93

Open matentzn opened 2 years ago

matentzn commented 2 years ago

I am trying to make a dent into the non-interpretability of the prefixes used in hasDbXref expressions. As discussed in great detail elsewhere, hasDbXref will continue to be a CURIE string (not an entity) that lives outside of the normal RDF world. In order to standardise uses of prefixes, I want to propose at least an AP that we can use to link to a context we can use to disambiguate the prefixes used in the hasDbXref properties of some ontology. So I envision an annotation like:

obo:my.owl rdf:type owl:Ontology .
obo:my.owl OMO:0001111 <https://raw.githubusercontent.com/geneontology/go-site/master/metadata/db-xrefs.yaml> .

This could then tell the user that

my:001 oio:hasDbXref "HP:001"

refers to and HPO term.

Or better yet, linking it to a formal context.

Anyone opposed to such a property?

matentzn commented 2 years ago

cc @dosumis @cmungall

dosumis commented 2 years ago

Agreed.

  1. We should tag anything with a valid OBO prefix as an ontology mapping and not subject this to QC.
  2. For all the others, we should look up in GO_xrefs yaml and pull the appropriate stanza from there if possible. Remaining should be curated or deleted.

Also agreed that the master mapping file can lives on the OMO repo.

balhoff commented 2 years ago

An alternative approach might be to include the prefix declarations in machine-readable form (but "reified") within the ontology rather than linking out, using the SHACL vocabulary. We're using this here: https://github.com/OBOFoundry/OBOFoundry.github.io/blob/master/registry/obo_prefixes.ttl

matentzn commented 2 years ago

@balhoff totally, good idea. Would you say, just importing this with owl:imports?

balhoff commented 2 years ago

@balhoff totally, good idea. Would you say, just importing this with owl:imports?

Yeah that sounds great.

matentzn commented 2 years ago

Here is the current bioregistry shacl prefixmap: https://github.com/biopragmatics/bioregistry/blob/main/exports/contexts/obo_synonyms.context.ttl

This brings us only half-way. What remains is how we can connect this information with other validation information, such as ID regex patterns?

cthoyt commented 2 years ago

Does shacl have a way of representing the regex? Or can we introduce a new property to go along with shacl where we can add regexes into this bioregistry export?

Update: indeed it does! See https://github.com/biopragmatics/bioregistry/blob/main/exports/contexts/bioregistry.context.ttl for an example of the Bioregistry output in shacl that includes sh:pattern for many resources.

cthoyt commented 1 year ago

Update, SHACL does have a way to represent patterns. See https://github.com/biopragmatics/bioregistry/blob/main/exports/contexts/bioregistry.context.ttl for an example.