shapetrees / specification

Specification for Shape Trees
https://shapetrees.org
Other
12 stars 5 forks source link

[Question] Is the st:validatedBy always a link to an external resource? #24

Closed pietercolpaert closed 3 years ago

pietercolpaert commented 3 years ago

When you discover a shape tree, then it may have a st:validatedBy property to a shex or shacl resource. As it is defined now, this data can be embedded in the page itself, or you may want to dereference the URL.

It is unclear to me when to choose for one or the other: what shex/shacl triples are needed in order for you to decide that you don’t need to dereference the URL? Or is it intended to always be an external resource that you need to fetch?

If I understand it correctly, then there’s an important limitation to this approach when reusing existing shapes. I might want to, for a certain shape tree, collect existing shapes in 1 resource, but still keep using their original URI. However, the example in the primer uses hash identifiers (e.g. CommonNote#citation) in order to identify the right shape from that common document. Could it be interesting to instead also have a hypermedia link such as st:shapesResource to the resource where the shapes are expressed in full? Then the example from the primer could look like this:

<#citation> st:expectsType st:ShapeTreeResource ;
            # ... ;
            st:validatedBy <CommonNote#citation> ;
            st:shapesResource <CommonNote> .

Of course st:shapesResource is just a suggestion.

joshdcollins commented 3 years ago

Thanks for reaching out Pieter.

A few thoughts on your question:

As it is defined now, this data can be embedded in the page itself, or you may want to dereference the URL.

This is correct, schema details could be located within the shape tree document if using SHACL or ShExR (RDF serialization of ShEx).

It is unclear to me when to choose for one or the other: what shex/shacl triples are needed in order for you to decide that you don’t need to dereference the URL? Or is it intended to always be an external resource that you need to fetch?

We follow Linked Data principles and expect implementations to "follow your nose" to the targeted IRI -- whether local within the same resource or remote.

If I understand it correctly, then there’s an important limitation to this approach when reusing existing shapes. I might want to, for a certain shape tree, collect existing shapes in 1 resource, but still keep using their original URI.

In this case, we would recommend your implementation provide some cache that would abstract the indirection discussed -- effectively mapping an IRI used for identification purposes to the physical location you're hosting it on.

Could it be interesting to instead also have a hypermedia link such as st:shapesResource to the resource where the shapes are expressed in full?

A goal we have is for these shape tree resources to be reusable -- much like shapes can be today. Adding a statement to locate the resource (st:shapesResource) identified by the public identifier (st:validatedBy) would inhibit the reusability of the shape tree definition.

pietercolpaert commented 3 years ago

Hi @joshdcollins and @justinwb, thank you very much for the elaborated explanation!

We follow Linked Data principles and expect implementations to "follow your nose" to the targeted IRI -- whether local within the same resource or remote.

Right now however it is unclear when you should dereference the resource and when you shouldn’t. For example, one server implementation could already include an rdfs:label and rdf:type with the IRI to for example an sh:NodeShape. Another one could also already include the sh:closed boolean, which could be useful for some applications to understand that this will be a stricter shape than another. However, how do you know that you have all triples necessary to pass it on to a SHACL validator, or to a source selection algorithm¬? Because having an empty NodeShape is also a valid shape.

¬ This is my use case: I want want to use Shape Tree definitions, and their st:validatedBy properties, for selecting the right container for answering a certain query. See:

A goal we have is for these shape tree resources to be reusable -- much like shapes can be today. Adding a statement to locate the resource (st:shapesResource) identified by the public identifier (st:validatedBy) would inhibit the reusability of the shape tree definition.

I don’t think reusability would be inhibited, to the contrary:

ericprud commented 3 years ago

Hi @joshdcollins and @justinwb, thank you very much for the elaborated explanation!

We follow Linked Data principles and expect implementations to "follow your nose" to the targeted IRI -- whether local within the same resource or remote.

Right now however it is unclear when you should dereference the resource and when you shouldn’t. For example, one server implementation could already include an rdfs:label and rdf:type with the IRI to for example an sh:NodeShape. Another one could also already include the sh:closed boolean, which could be useful for some applications to understand that this will be a stricter shape than another. However, how do you know that you have all triples necessary to pass it on to a SHACL validator, or to a source selection algorithm¬? Because having an empty NodeShape is also a valid shape.

I think that argument applies equally to any Linked Data resource, or really even any Web resource which has some way of combining documents. For instance, any document that references a WebId might include some triples from that WebId, and it's up to the consumer to ascertain whether they have all the triples they need or whether they should fetch the WebId. (The answer is typically to fetch the resource and count on HTTP's caching semantics to make sure you don't have to do it often.)

¬ This is my use case: I want want to use Shape Tree definitions, and their st:validatedBy properties, for selecting the right container for answering a certain query. See:

That makes sense. It's certainly consistent with the Solid ecosystem we've been developing in the Solid Data Interoperability panel (which I invite you to; you're obviously already engaged in this so you have nothing to lose),

A goal we have is for these shape tree resources to be reusable -- much like shapes can be today. Adding a statement to locate the resource (st:shapesResource) identified by the public identifier (st:validatedBy) would inhibit the reusability of the shape tree definition.

I don’t think reusability would be inhibited, to the contrary:

  • With my proposal, the current use case is still possible: clients can still also just dereference the IRI of the shape itself. Even when the property st:shapesResource would be set, then still a client can also choose to ignore this and still dereference the IRI itself.

How would the client know when it could safely ignore st:shapesResource?

My point is that follow-your-nose semantics are pretty straightforward; you thing you're identifying is the response to a GET of that URL. If you say "you might want to look here instead", you don't solve the problem unless you tell people when to look there. If you say "you MUST look here instead", then here may as well be the locator. If you say "I want to look here, but everyone else should use the original URL", you've made a private extension to a data structure, which you could do without telling anyone about it. But, if you say "for these specific circumstances, look here instead", you've created a public extension to the data structure, which you are free to do. And finally, if your public extension is useful to most of the users of the data structure, that's an argument for adding your extension back into the authoritative definition for the data structure.

  • The argument of consistency: imagine the maintainer of the shape resource changes their shape, but you didn’t yet sync the data in the new shape. Then you might want to point to a historic shape to which your server still complies. A solution would be to import the shapes into your own resource at the same time as you would sync these new properties.

This comes back to the point about having very specific rules for when to use one locator instead of the other. I don't think it's possible here. For instance, if I publish data which uses an obsolete shape, I'm not doing you any favors by claiming that it uses a shape which it does not use. I could instead publish my version of the schema which would serve us both better because we'd know exactly how to validate the data we were exchanging.

Versioning is a serious issue for shared resources. For instance, most machine-readable infrastructure published by the W3C (schemas, DTDs...) is published with a versioned identifier and a promise not to do anything more than obvious bugfixes to it. I fully support metadata and publication policies to address versions and optimize for reuse, but I think think that inventing another identifier solves the problem on its own.

pietercolpaert commented 3 years ago

Thanks for the reply @ericprud! I agree with your explanation.

And finally, if your public extension is useful to most of the users of the data structure, that's an argument for adding your extension back into the authoritative definition for the data structure.

My question thus becomes: do you think it’s useful to have an authoritative definition for this?

Also a foaf:isPrimaryTopicOf could be reused, but then I think the spec should mention that it prefers this if an external document contains the shape.

ericprud commented 3 years ago

Thanks for the reply @ericprud! I agree with your explanation.

Possibly because I typo'd the last line (oops, sorry!). I wrote:

but I think think that inventing another identifier solves the problem on its own.

but meant that it does not solve the problem.

My question thus becomes: do you think it’s useful to have an authoritative definition for this?

Also a foaf:isPrimaryTopicOf could be reused, but then I think the spec should mention that it prefers this if an external document contains the shape.

Here I come back to not being persuaded that the extra property, whether it's foaf:isPrimaryTopicOf or st:shapesResource, solves the problem. I believe that if it's unambiguous which of st:isValidatedBy or st:shapesResource you need to dereference, you're better off writing that in the st:isValidatedBy slot. This avoids:

  1. ambiguity - should someone reading the ShapeTree use st:isValidatedBy or st:shapesResource?
  2. skew - multiple ShapeTrees refer to the same shape in st:isValidatedBy but that no longer asserts that they are compatible.

You could say that st:shapesResource preempts st:isValidatedBy but then you have the same problem again, specifically, what if I want to use a different st:shapesResource than the one published in the ShapeTree? The same arguments for adding a st:shapesResource would equally apply to adding a st:metaShapesResource and a st:metaMetaShapesResource. They never solve the versioning problem, they just push it along ahead of you.

elf-pavlik commented 3 years ago

Hi @pietercolpaert 👋 We discussed this issue shortly during today's Solid Data Interop Panel call.

I might want to, for a certain shape tree, collect existing shapes in 1 resource, but still keep using their original URI.

I think it may come helpful here to clarify expectations of publisher of that shape tree as well as consumers. Especially importance of using original URI, while providing it's description through other means than simply resolving it. I can think of two cases, very likely more cases exist: 1) shape tree publisher doesn't want to rely on availability of original description (http://vocab.deri.ie comes to my mind 😉 ) 2) shape tree publisher want's to use different description than original one

I would consider them as two distinct problems where each may require dedicated solutions. Especially the second one seems more nuanced to me. When different publishers and consumers choose to rely on a shared IRI reference, they possibly want to rely on shared knowledge including description of what that IRI denotes. Having different descriptions, while allowed by "Anyone can say anything about anything", may possibly impede that reliance on shared knowledge. In some authorization related use cases and requirements, we consider relying on a reference to a common shape (and/or shape tree) to define equivalent of OAuth scope. In those cases not having authoritative description of that shape would have significant impact on security.

The argument of consistency: imagine the maintainer of the shape resource changes their shape, but you didn’t yet sync the data in the new shape. Then you might want to point to a historic shape to which your server still complies. A solution would be to import the shapes into your own resource at the same time as you would sync these new properties.

What advantage would it have over creating distinct IRI denoting that historic version (could be in the namespace under your control) and using that reference while you sync the data to validate against the new shape?

pietercolpaert commented 3 years ago

Hi @elf-pavlik! Small world yet again ;-) I agree with you that a distinct version IRI would be a better idea for that purpose.

@ericprud I needed some time to digest your answer and after writing out some examples on paper and following @elf-pavlik’s reasoning, I agree that all use cases can always be covered by “following your nose”, as there is no reason one would want to republish the shapes of someone else with the same IRI on another resource. If you’d want that, for e.g., archiving purposes, that would be a feature that goes beyond the spec of Shape Trees.

I’ll close this issue and also adapt the TREE specification to only use tree:shape instead of allowing a tree:importShape.

ericprud commented 3 years ago

@pietercolpaert , i'm happy to geek through this more with you; feel free to ping me on gitter