information-artifact-ontology / ontology-metadata

OBO Metadata Ontology
Creative Commons Zero v1.0 Universal
19 stars 8 forks source link

Add a standard way of indicating the broad category a class belongs to #51

Open cmungall opened 4 years ago

cmungall commented 4 years ago

It can be extremely useful to know in advance which category a class belongs to. For example, GO has 3 branches: biological_process, molecular_function, and cellular_component. This was implemented using oio:hasOBONamespace

Of course it is possible to get this dynamically by reasoning / following subClassOf* chains to appropriate categorizing class but this is often impractical

I propose that we have a standard annotation property and a standard value set to indicate the category a class belongs to, and recommend that (a) ontologies MAY use this (b) if an ontology uses this, they MUST implement in a uniform fashion, i.e every non-deprecated class in base is tagged (c) cardinality is zero or one

I don't think we should mint a new property. Candidates are:

For the value set an obvious choice is COB. Another option is Biolink (formally the obo class instances might be instances of bl classes, but this can be indicated with bl:category).

goodb commented 4 years ago

@cmungall do you think this is a general problem beyond the OBO world? I think this really comes down to the size of the ontologies and the degree to which reasoning tools can handle them. If we have access to instantaneous subClassOf* then there would be no reason for adding the annotation. I have a feeling OBO is unique with respect to the large size of the ontologies and common use cases that involve treating the OWL ontologies as simple graphs used without accessing a reasoner. My intuition is that this is mainly an OBO problem, hence the biolink:category property and value set would be the best fit.

Question. Would you allow multiple categories? e.g. for CC in GO it is now useful to discriminate between the cellular anatomical entity and protein complex branches.

cmungall commented 4 years ago

@bgood having a reliable always-on pre-reasoned sparql endpoint may help a lot here, but there are many use cases involving working with local files, or cases where services aren't pragmatic (e.g. iterating over 10000s of instances; or practical use of ShEx/SHACL with OBO).

I am not sure it is size per-se, rather the fact that OBO contrasts with schema-style ontologies common in the broader semantic web. But this is a broad topic deserving of a more detailed response..

I think there is a use case for having 2 properties, one for direct, the other for direct+ancestors.

matentzn commented 4 years ago

I think having that is a good idea for all the knowledge graph work that is coming towards us - it would be fairly straight forward to implement an auto-tagging system for release using SPARQL. It will look a bit weird though if A subclassOf COB:atom and A biolink:category COB:atom, but I think there is a sense to it. Definitely NOT rdf:type. I like biolink:category for this! dc:type is a bit very general, and maybe that was used for other purposes so..