Explicitly assert subClassOf in gist where applicable

marksem commented 2 years ago

Request: Explicitly assert subClassOf in gist where applicable.

Example: gist:Commitment is defined as

  (Requirement or Restriction)
 and (hasGiver some 
    (Organization or Person))
 and (isCategorizedBy some DegreeOfCommitment)

but it is not explicitly defined as subClassOf gist:Intention, even though it must be. Another example is gist:CoherentUnit (not explicitly declared as subClassOf gist:UnitOfMeasure)

Rationale: We want gist to be as usable as possible. Many more triple stores and visualization tools support class subsumption but do not support OWL to the level where they can "figure out" the subsumption if implied.

uscholdm commented 2 years ago

There are two levels of inference here, to get all of this sort of thing may require going all the way up the sublass hierarchy for all classes. I hope not, but how are you going to get only what you want?

justin2004 commented 2 years ago

An option is to run an OWL 2 DL reasoner on gistCore.ttl when we package up a release and make a .ttl file available in the release with all the triples ingistCore.ttl + all of the derivations.

If there is interest in that I can take a crack at it (so it can run it a hands-off manner).

uscholdm commented 2 years ago

A good idea - we might want to have a separate file of just materialized triples, so they can be swapped in a KG more easiliy.

rjyounes commented 2 years ago

Other contexts where this would be helpful: SHACL with RDFS reasoning turned on (turning on OWL reasoning to interpret the intersections results in an untenably large performance hit), RDFox.

IMO the best way to handle this is to add a function to onto_tool to generate the triples and add to a separate file. Rather than just generating it in a release package, it would be helpful to have in the repo for development purposes. The easiest way to manage that is by adding it to the pre-commit hook.

dylan-sa commented 1 year ago

Using onto_tool to generate the subClassOf assertions may still be the best option, but I stumbled upon this feature of ROBOT and thought I'd link it here just in case it's useful.

rjyounes commented 1 year ago

That's very interesting, thanks for pointing it out. It's important to consolidate into onto_tool for the bundling, but we may be able to call this ROBOT function from onto_tool by making appropriate additions to bundle.yaml - see the definition of the serializer tool and the transform action.

Another approach would be to do everything in bundle.yaml itself using a sparql action - see the sparql action for generating rdfs:labels into a new file. This could be done without running a reasoner by querying for an intersection and selecting only non-blank nodes. This would not require a change to onto_tool as my previous suggestion did.

The pre-commit hook could call ROBOT directly or go through onto_tool.

@justin2004 You may want to consider these alternatives to modifying onto_tool itself.

rjyounes commented 1 year ago

@dylan-sa and I are working on this for a client, so we can take this one on too. We'll start with a stand-alone script, then looking into adding to onto_tool.

rjyounes commented 1 year ago

@dylan-sa For release 12.0.0, let's just create the file to include in the release package. A second step is to run it on every commit, which could be handled in a couple of different ways:

Include in the pre-commit hook.
Include in bundle.yaml so it runs during CI. In this case we could also create the function in onto_tool and submit a PR, but I'm not sure this is necessary.

The first might be preferable so that it runs before pushing to the remote repo.

This isn't required for the upcoming release, however. I've created a new issue #819 for this step.

dylan-sa commented 1 year ago

I have created a draft subClassOf supplement for us to include in the gist 12 release package. The final version may differ depending on the changes that make it into the release, but I am hoping to get feedback on the general structure.

The general idea (as @justin2004 suggested) was to run an OWL 2 DL reasoner over gist to get subclass assertions like those cited by @marksem. (Some additional manipulation was required as detailed below.)

The supplement ends up taking the following shape:

In addition to the subClassOf statements that are explicitly asserted in gist, the supplement includes statements that are implied but not asserted in gist.
- This includes relationships that are tougher to glean, like gist:Commitment rdfs:subClassOf gist:Intention and gist:CoherentUnit rdfs:subClassOf gist:UnitOfMeasure.
- It also includes assertions that result from the transitivity of rdfs:subClassOf. So, for example, since gist:GeoRoute rdfs:subClassOf gist:OrderedCollection and gist:OrderedCollection rdfs:subClassOf gist:Collection, gist:GeoRoute rdfs:subClassOf gist:Collection is included as well.
The supplement excludes statements that would be true of any class: ?x rdfs:subClassOf owl:Thing, ?x rdfs:subClassOf ?x, and owl:Nothing rdfs:subClassOf ?x. All of these kinds of statements are going to be true no matter which class you plug in for ?x; they'd just bloat the file without adding much value.

Any thoughts on what is valuable to include are much appreciated.

rjyounes commented 1 year ago

I'm not convinced that we should assert the cascade of subclass assertions. The point was to extract subclass assertions buried in an equivalentClass assertion for RL and RDFS reasoners that cannot do that. These reasoners can provide the upward chaining of subclassing themselves. That means, for example, that we don't need this:

gist:Actuator
    rdfs:subClassOf
        gist:Artifact ,
        gist:Equipment ,
        gist:PhysicalIdentifiableItem
        ;
    .

or this:

gist:Account
    rdfs:subClassOf
        gist:Commitment ,
        gist:Intention
        ;
    .

I realize this is trickier to implement. One way is to run the DL reasoner and subtract from the result any triples that are asserted in the full ontology.

Jamie-SA commented 1 year ago

I'm not convinced that we should assert the cascade of subclass assertions.

@rjyounes but what are you gaining by putting in effort to try and remove them. I think they are fine to leave them in.

I am curious, how many are we talking about? A few? Or lots?

rjyounes commented 1 year ago

In response to the above comments, we have decided:

Remove cascade of subclasses, only provide direct subclass assertions. Dylan says this is easy to do with a particular configuration of hermit in Protege.
Keep the direct subclass assertions generated by the reasoner that are also expressed in the ontology, such as:

gist:Actuator rdfs:subClassOf gist:Equipment ;

This is tricky enough to eliminate that it's not worth the effort, but could potentially be included in the eventual script that automates this (see below).

Additional notes:

The change requires modifications to bundle.yaml: where individual Turtle files are listed in the target, the new file should be added. However, first it should be determined whether this is necessary, by using the wildcard *.ttl as in other targets and testing the result. If it's not necessary, change all lists to wildcard expressions.
Later we will want a script to automate this, possibly called from the pre-commit hook so that we can use it in development versions as well as release versions. There may be a separate issue for this already; if not it should be added. It could possibly be incorporated as an onto_tool function.
The current change should be accompanied by changes to the documentation on creating the release package: https://semarts.atlassian.net/wiki/spaces/OF/pages/1126760539/gist+Release+Management+Ontologists, since it will need to be run manually as part of the release. @dylan-sa, can you please include that? I'm not sure if it requires changes to any of our documentation in the repository /docs folder, but you could check for that as well.

uscholdm commented 1 year ago

The point was to extract subclass assertions buried in an equivalentClass assertion for RL and RDFS reasoners that cannot do that. These reasoners can provide the upward chaining of subclassing themselves.

That's very interesting, had not thought of that. All along, I assumed this was about inferring all subclasses. WE have done that a lot over the years, before reasoners worked in triple stores. Its much easier to do it that way also. I don't know what is best.

uscholdm commented 1 year ago

This is tricky enough to eliminate that it's not worth the effort, but could potentially be included in the eventual script that automates this (see below).

What exactly is tricky enough to eliminate? Im confused about what you plan to do.

dylan-sa commented 1 year ago

Updated version of the supplement here.

rjyounes commented 1 year ago

@dylan-sa Are you going to submit a PR for this?

dylan-sa commented 1 year ago

Yep, I've created a PR for this one here: https://github.com/semanticarts/gist/pull/837

semanticarts / gist

Explicitly assert subClassOf in gist where applicable #714