w3c / data-shapes

RDF Data Shapes WG repo
87 stars 33 forks source link

Lift syntactic restrictions on property path notation #137

Open wouterbeek opened 2 years ago

wouterbeek commented 2 years ago

Request for change

Let's lift the requirement that some nodes that are part of SHACL property path specifications must be blank nodes.

Example

This is an example phrase where the SHACL standard currently requires the use of blank nodes (emphasis mine):

A sequence path is a blank node that is a SHACL list with at least two members and each member is a well-formed SHACL property path.

Motivation

  1. The syntactic restriction to blank nodes is unnecessary. It is possible to write down RDF lists that conform to the RDF 1.1 Schema standard with blank nodes, with IRIs, and with combination of blank nodes and IRIs. The syntactic requirement to only use blank nodes in the case of SHACL property paths does not serve a specific purpose.

  2. The syntactic restriction to blank nodes is limiting. Some data publishers choose to publish their RDF lists with (generated) IRIs. This allows such publishers to implement dereferencing for RDF list nodes and/or make the output of automated processes such as ETLs more predictable (e.g., allowing a meaningful diff/delta to be made between the outputs of two runs of the same automated process over time without requiring somewhat complex blank node normalization steps to be performed).

  3. The syntactic restriction to blank nodes conflicts with the following statement in Section 3.5 in the RDF 1.1 standard:

    In situations where stronger identification is needed, systems may systematically replace some or all of the blank nodes in an RDF graph with IRIs.

    According to the RDF 1.1 standard, data publishers are allowed to apply Skolemization to their data. Skolemization systematically replaces blank nodes with well-known IRIs. The syntactic requirement on the use of blank nodes in SHACL property paths prevents data publishers from applying Skolemization in combination with the use of SHACL property paths.

    (Notice that a data publisher may have good grounds to apply Skolemization. This makes certain graph comparison and graph merge operations, that otherwise require standardizing the blank nodes apart, simpler.)

white-gecko commented 2 years ago

I have already raised some comment #25 to get SHACL more skolem friendly but had no capacity to follow-up. I think the points reised by @wouterbeek are relevant. In my eyes it would be good to get SHACL more skolem friendly and less reliant on putting meaning in a node to be blank.

TallTed commented 2 years ago

Makes sense to me.

@HolgerKnublauch -- Can this and similarly well-articulated Issues and/or PRs get tagged with "SHACL 1.1" or similar, so we start to have more easily measured community push for revision?

That will help us get a SHACL 1.1 (or similar) WG chartered, and part of its work can be adjusting its output Rec to ease the way to future corrections or other revisions, conforming to newer W3 process.

HolgerKnublauch commented 2 years ago

As background, this was ticket https://www.w3.org/2014/data-shapes/track/issues/41 and I remember some people also voiced concerns about the use of bnodes here.

A SHACL 1.1 WG may theoretically elect to change this, but I wouldn't be too sure that this is realistic given that such a change would break existing implementations. The main concern is that IRIs are used to distinguish named properties from paths. The redesign would need to be like "if a path matches none of the other syntaxes (such as having a sh:inversePath) then it will be treated as IRI property". So a processor would first need to exhaust all other tests before concluding that the node represents a named property, which is quite slow. Further, if we allow IRIs for paths, then people could reference them, e.g. to attach sh:inversePath triples to them from other graphs, completely changing the meaning in a rather intransparent way. The situation is already a bit asymmetric because named properties would still require a IRI while all other path types could be either-or.

Last but not least, if we go down this route then sh:nodeKind becomes questionable and would likely need to be removed/changed beyond recognition.

@TallTed I have added a SHACL 1.1 tag and used it here. Most of the other currently open tickets here are rather trivial editorial items that will almost certainly be included into a future revision. I found one other ticket that is now also marked SHACL 1.1