w3c / shacl

SHACL Community Group (Post-REC activitities)
27 stars 4 forks source link

Enable "Shapes as Data" Paradigm #39

Open mgberg opened 4 months ago

mgberg commented 4 months ago

I've had an idea for a possible extension to SHACL for a while and I'm wondering what others think about it.

Over the past couple years, I have run into several situations where constraints are part of the domain of interest and those constraints should apply to other data in the domain. In those cases, it would be helpful to have shapes be defined as part of data instead of at the schema level, and it would be helpful if the SHACL engine knew how data were connected to these shapes they should be validated against via some existing path expressed in domain terminology.

Doing this would prevent users from needing to extend the ontology/schema to add new constraints. Also, it could prevent the use of metamodeling to accomplish a similar goal, which can get messy and confusing for users.

Here are three generic examples where this feature could potentially be helpful to help convey the idea:

Example 1

Consider the Function Ontology (https://fno.io/spec/#ontology-abstract). If you look at the documentation for fno:Parameter and fno:Output they look very similar to sh:PropertyShape in spirit, and the class fno:Function is therefore like sh:NodeShape. It might be useful to use SHACL to validate sure that function arguments and outputs match what is expected based on the function definition.

However, if instances of fno:Function were Node Shapes, then there would be no convenient way to configure each fno:Function instance to target the right nodes with current SHACL. You'd have to either make each one a class and have the corresponding instances of fno:Execution be instances of each (which might tempt the introduction of metamodeling similar to SPIN Functions), write a clunky custom target type using SHACL-AF that wouldn't be supported by all SHACL engines, or use sh:targetNode to connect each fno:Function instance to the corresponding fno:Execution instances instead of the domain property fno:executes (or in addition to it, which would be redundant).

In this case, it would be convenient if each fno:Execution could be validated against whatever node it was connected to via fno:executes.

Example 2

Consider some future state of the W3C Data Cube ontology. Data Structure Definitions (https://www.w3.org/TR/vocab-data-cube/#dsd-dsd) and Component Specifications are data in this domain. However, they could be modified to be represented as Node Shapes and Property Shapes respectively such that SHACL could be used to validate that the Observations that are part of DataSets that have that Data Structure Definition actually conform to that structure.

The same challenges exist for trying to validate a qb:DataStructureDefinition as a Node Shape as for fno:Function; there is no convenient way to configure each qb:DataStructureDefinition instance to target the right nodes with current SHACL.

In this case, it would be convenient if each qb:Observation could be validated against whatever node it was connected to via the path qb:dataSet/qb:structure.

Furthermore, this would allow more fancy data cube behavior more easily, like how shapes are used for datatypes of QB components here: https://docs.allotrope.org/ADF%20Data%20Cube%20Ontology.html (see examples 5 and 11)

Example 3

Consider the EP-PLAN ontology (https://trustlens.github.io/EP-PLAN/, documentation: https://trustlens.github.io/EP-PLAN/widoco_output/index-en.html), an extension to W3C PROV for capturing in detail the plans that go along with the Activities in PROV. It may be desired to use SHACL to determine whether an activity went according to plan or if some deviation occured. Note that ep-plan:Step and ep-plan:Variable both could be similar in spirit to sh:NodeShape.

The same challenges exist for trying to validate instances of these classes as Node Shapes as for fno:Function; there is no convenient way to configure each ep-plan:Step and ep-plan:Variable instance to target the right nodes with current SHACL.

In this case, it would be convienient if each ep-plan:Activity could be validated against whatever node it was connected to via ep-plan:correspondsToStep and if each ep-plan:Entity could be validated against whatever node it was connected to via ep-plan:correspondsToVariable.

Possible Implementation

I've thought of a few different ways to implement this behavior, but I think the simplest and most efficient way I've thought of so far is to create a new Constraint Component.

This new Constraint Component would function somewhat like the one for sh:node. However, instead of specifying the URI of a Node Shape that value nodes must also conform to, it specifies a SHACL path using a parameter perhaps called, e.g., sh:nodesPath. For each value node for the shape with a value for sh:nodesPath, that value node is also validated against any Node Shape(s) found at the specifed path from the value node (if any resources at that path exist and are Node Shapes).

This would enable the following addition for the Function Ontology in order to validate that all instances of fno:Execution conform to any corresponding instance of fno:Function:

fno:Execution
  sh:nodesPath fno:executes ;
.

And this addition for the Data Cube Ontology in order to validate that all instances of qb:Observation conform to any corresponding instance of qb:DataStructureDefinition:

qb:Observation
  sh:nodesPath (
    qb:dataSet
    qb:structure
  ) ;
.

And these additions for the EP-PLAN Ontology in order to validate that all instances of ep-plan:Activity conform to any corresponding instance(s) of ep-plan:Step and that all instances of ep-plan:Entity conform to any corresponding instance(s) of ep-plan:Variable:

ep-plan:Activity
  sh:nodesPath ep-plan:correspondsToStep ;
.
ep-plan:Entity
  sh:nodesPath ep-plan:correspondsToVariable ;
.

My main reservation with this approach is that I'm not a huge fan of how if sh:node fails validation, many SHACL engines don't include the nested results via sh:detail in their reports, and this constraint would probably function the same way. I hope that more validators would use/take advantage of sh:detail in the future in general.

I have added a prototype implementation of this to this branch in this fork of pyshacl (just because I happen to be the most familiar with the internals of that SHACL engine) and have been playing around with it. Included in this folder in the repo is a file with example data and shapes that demonstrates how it works, as well as the output from the modified version of pyshacl (cleaned up a bit for readability).

I'm curious to know what the community thinks of this, both as a concept and also this particular method of implementation.