ucoProject / UCO

This repository is for development of the Unified Cyber Ontology.
Apache License 2.0
80 stars 34 forks source link

Remove requirement of repeating all property constraints from parent to child classes #417

Closed ajnelson-nist closed 2 years ago

ajnelson-nist commented 2 years ago

Background

As the initial UCO implementation of SHACL was being implemented in UCO 0.7.0, a tool was incorporated to perform some review of sets of constraints between subclasses. This Issue will call that tool "The SHIR codebase" ("SHACL Inheritance Review"). In summary, the tool looked at the set of data X that would be accepted by all constraints on a class C, and compared the set of data X' that would be accepted by all constraints on any subclass of C, C'. If X' was a larger set than X---that is, the constraints of C' would seem to permit more---the tool would report an error in the ontology design between class and subclass.

That tool was initially drafted with an understanding of SHACL that has since grown. In particular, one implementation pattern has been frequently desired as SHACL has been trialed for the last year: Often, a sh:PropertyShape for a subclass C' may be defined that only means to tighten one constraint from the corresponding sh:PropertyShape on the parent class C, and not declare anything about other constraints.

Take, e.g.:

ex:ClassC
    a
        owl:Class ,
        sh:NodeShape
        ;
    sh:property [
        a sh:PropertyShape ;
        sh:class ex:ClassA ;
        sh:maxCount 2 ;
        sh:path ex:propertyP ;
    ] ;
    sh:targetClass ex:ClassC ;
    .

ex:ClassD
    a
        owl:Class ,
        sh:NodeShape
        ;
    rdfs:subClassOf ex:ClassC ;
    sh:property [
        a sh:PropertyShape ;
        rdfs:comment "This property shape is OK."@en ;
        sh:maxCount 1 ;
        sh:path ex:propertyP ;
    ] ;
    sh:targetClass ex:ClassC ;
    .

ex:ClassE
    a
        owl:Class ,
        sh:NodeShape
        ;
    rdfs:subClassOf ex:ClassC ;
    sh:property [
        a sh:PropertyShape ;
        rdfs:comment "This property shape is NOT OK."@en ;
        sh:maxCount 3 ;
        sh:path ex:propertyP ;
    ] ;
    sh:targetClass ex:ClassC ;
    .

The property shape on ex:ClassD is consistent with the demands of its parent ex:ClassC. Requiring a ex:ClassD having at most one value for ex:propertyP is consistent with ex:ClassC having at most two.

The property shape on ex:ClassE is inconsistent with the demands of its parent ex:ClassC. Requiring a ex:ClassE having at most three values for ex:propertyP is inconsistent with ex:ClassC having at most two. The SHIR code base would fail the consistency review based on the sh:maxCount in ex:ClassE's property shape.

The property shapes on both ex:ClassD and ex:ClassE both make another choice, which is what the SHIR code base will support in its impending release: the class constraint, that objects referenced via ex:propertyP must be of class ex:ClassA, was omitted. This is a pattern of delegation to the parent class: Some class in the superclass hierarchy makes a constraint about requiring ex:ClassAs, and the subclasses are fine to not repeat that.

In summary, the next release of the SHIR tool is set to continue flagging explicit constraint expansions as the subclass hierarchy is traversed. But, omission of constraints will no longer be treated as expansions, and further will not be treated as errors.

Requirements

Requirement 1

UCO must not require repetition of property constraints from parent class to child class when nothing is changed about the constraint.

Requirement 2

UCO must adopt a version of SHIR that permits omitting repeated property constraints.

Risk / Benefit analysis

Benefits

Some exercises are being performed to test interoperability with UCO and other ontologies. For instance, CASE-Corpora uses a small, "Zippering" ontology that defines subclasses between the independent class hierarchies of DCAT v2 (and its dependent ontologies and data models, including Dublin Core Terms), and CASE & UCO. That "Zippering" ontology has had to make repetitions of some of the property constraints from parent classes due to demands from review by SHIR. Unfortunately, the effect of an instance of invalid data becomes multiplied with the class hierarchy's height. Every violated constraint is reported in the SHACL violations report, even if the constraints are completely redundant with one or more parent classes. A demonstration of this effect is included in SHIR PR 12.

Hence, one benefit to having SHIR permit constraint delegation is reduced error output, and the errors that are reported are localized to the shape of the topmost class in the hierarchy imposing the constraint.

There is also a side-benefit of adoption of the next release of SHIR, though this only benefits ontology developers (submitter included) who use UCO's make-based workflow. Some automated Python code review has been put into place to improve testing, and a build artifact created from UCO's local pip install is now property ignored by Git. (Currently, running make installs SHIR into a virtual environment as part of CI testing, but a build artifact is generated and flags the submodule as "Dirty." That is fixed in SHIR's develop branch.)

Risks

Within the scope of UCO, there has been a desire to be explicit on sh:PropertyShapes. That is, repeating each constraint for every sh:PropertyShape down the subclass hierarchy has been somewhat seen as a desired feature.

The submitter believes the benefit of repeated constraints does not outweigh the multiplied messages a user encounters for a violation of a repeated constraint. Such noise can also mask other data errors, especially if some "Info"- or "Warning"-level violations are set to be accepted for validation but turn out to be operationally significant.

Competencies demonstrated

The competencies are demonstrated in SHIR pull requests.

Competency 1

Suppose instance data triggers a property constraint violation in some instance of a subclass C' of a class C.

Competency Question 1.1

What is the class that originated the violated constraint?

Result 1.1

With adoption of the next version of SHIR, a constraint does not need to be repeated. See SHIR PR 12, particularly the redundant shapes in ex-constraint-repetition.ttl leading to redundant sh:ValidationResults in kb-test-8.ttl.

Before adoption of the next version of SHIR, the sh:PropertyShapes of C' and C are both reported in the SHACL validation output.

Solution suggestion

Coordination

ajnelson-nist commented 2 years ago

PR 418 implements the solution for this proposal.

sbarnum commented 2 years ago

I should make a note here that this change will break and require changes to existing code in the form of the UCO Data Platform (UDP) built by MITRE and for planned future open source release to the community. The UDP autogenerates a GraphQL schema and associated API from UCO and UCO-based ontologies. This proposed change will require reworking how the code handles subclass level property shape overrides.

I personally would value the explicit nature of the current approach over limiting edge case repetition of class height validation errors in a specific custom zippering ontology related to but outside UCO.

ajnelson-nist commented 2 years ago

Capturing in public comment, re:

I should make a note here that this change will break and require changes to existing code ...

This Issue and its associated PR does not include removal of redundant property constraints. It only enables making those redundant constraints optional.

Also, the motivation for that comment was a reliance on UCO's overly-explicit property constraint specification within one tool. I noted in the OCs meeting on Tuesday that UCO's Data Platform is not the only tool that made use of property constraints. IIRC, the documentation generator UCO uses may also have made use of all constraints being in a single sh:PropertyShape.

Any tool trying to determine all applicable property constraints needs to be aware of the permission in general SHACL to specify those constraints at multiple levels of a RDFS or OWL subclass hierarchy. So, tools must be prepared to start at a class and look "Up"ward in the hierarchy to determine all applicable constraints.

ajnelson-nist commented 2 years ago

The solution for this Issue was provided at the time of the initial Requirements Review opening. No practical change has taken place in the code base, and its 0.3.0 release has been stamped. This Issue is now in Solutions Discussion. A Solutions Approval vote will be held in the Aug 25 meeting.