shapetrees / specification

Specification for Shape Trees
https://shapetrees.org
Other
12 stars 5 forks source link

advertise Managed Container in HTTP (Link: header) vs. Container triples #38

Open kjetilk opened 3 years ago

kjetilk commented 3 years ago

The discovery mechanism got me a bit baffled, and I just thought I'd suggest a couple of options that seems simpler:

Shape Tree metadata resource

A Container is a Managed Container iff it has a Link header as described that links to a resource that is a st:ShapeTreeDocument .

For a little background, there seems that we're too tied up in certain patterns here, namely the pattern of using hash URIs for everything. The reason why we're using hash URIs is to describe things that aren't information resources, like a person. You can't get a person over the network, so we say

<#me> a foaf:Person .

to have a separate URI for the meatspace person. But, you can perfectly well describe the document itself:

<> a foaf:PersonalProfileDocument .

is how that's done for "FOAF files".

Similarly, I think it makes an awful lot of sense to describe the Shape Trees metadata resource itself as just that, because a Shape Trees metadata resource will always be an information resource, and you can describe it as such. Once you have that, a triple in there should be sufficient reason to say that the container is managed. If the triple is there, it is managed, if not (indeed, if it doesn't even resolve), then it is not. No need to define 404 and stuff like that in the spec.

You could go a step further too, and have the Shape Tree metadata resource link back to the containers it manages, e.g.

<> a st:ShapeTreeDocument ;
   st:manages </foo/bar/>, </foo/baz/> .

Triples in the container

Alternatively, the discovery could be done entirely in RDF without a Link header:

<> a ldp:BasicContainer ;
   st:manager </data/CommonNotes/meta/UUID> .

This would mean that you'd need a GET rather than a HEAD, but that way, you'd know immediately after dereferencing the container whether it is managed. You could still have the above metadata in the Shape Tree metadata resource, always nice with bidirectional links, and they should probably be the final criterion anyway.

ericprud commented 3 years ago

Disclaimer: Justin and I are taking over editing from Josh so I at least may misunderstand the spec.

hash vs. slash

I don't think that the hash/slash conflict (war? crusade? feud?) affects the mechanics here. I believe the operations don't involve any magic URL string manipulation. As a test, I think all vocabulary terms could all be replaced with URNs and the Shape Tree (Link: </data/CommonNotes/meta/UUID>) could use any URL schema to point at some node in the tree.

That links between nodes of the tree are referenced by hash URLs is just a convenience; it could use slashes. The only inconvenience with that is that you'd want them all addressable so you'd have to then populate that slash hierarchy. They could, and probably will, occasionally be split over multiple sites when folks want to re-use components of other published Shape Trees.

(weak) PROPOSAL: replace </data/CommonNotes/meta/UUID> with </data/CommonNotes/meta/myTreeNode> (suggestions welcome) PROPOSAL:change the existing example to have a child (st:contains) on another site to clarify that it's not bound to one document or one site.

Container data vs server-managed metadata

From an information flow perspective, I think you could do it either way. Without looking at the context of the server management of instances, it's probably easier to edit LDP library code than server code, which would mean that adding it to the Container body would make it easier to deploy on existing infrastructure.

But (and it's a big but), I'm pretty confident we'd not want to do the same with the contained data. I consider that user data and want nothing to do with it. For example, I work with FHIR/RDF, which is specified by HL7 and has no place to write down Shape Tree associations. Given that, it seems to be a requirement that the server preserve the Shape Tree associations for these entities. Given that the server already has to provide some sort of Link header for the members of a Container, it's simpler to make that uniform across user data and Containers.

kjetilk commented 3 years ago

Disclaimer: Justin and I are taking over editing from Josh so I at least may misunderstand the spec.

hash vs. slash

I don't think that the hash/slash conflict (war? crusade? feud?) affects the mechanics here.

I agree! It wasn't what I was getting at at all. I think the problem with the current proposal is that it complicates things necessarily, you first need to a do a HEAD, then it specifies a certain HTTP status code, which is something that should be done at the protocol level, not here, and then you'd have to look for a description of a resource which is not a document, and then there's more HTTP stuff. It doesn't roll off the tongue ;-)

So, instead, it should be a criterion for a single triple. Iff that triple is there, then it is a Managed Container. That should suffice, and I intended with my post to propose some ways to do that.

(weak) PROPOSAL: replace </data/CommonNotes/meta/UUID> with </data/CommonNotes/meta/myTreeNode> (suggestions welcome) PROPOSAL:change the existing example to have a child (st:contains) on another site to clarify that it's not bound to one document or one site.

Why should this specification name the URI at all? Surely, it can be discovered from container metadata, whether that is the header or the body?

Container data vs server-managed metadata

From an information flow perspective, I think you could do it either way. Without looking at the context of the server management of instances, it's probably easier to edit LDP library code than server code, which would mean that adding it to the Container body would make it easier to deploy on existing infrastructure.

Yup.

But (and it's a big but), I'm pretty confident we'd not want to do the same with the contained data. I consider that user data and want nothing to do with it.

It depends on what you mean by "contained data". At the very least, the containment triples are already server-managed metadata, the user isn't allowed to touch that, so we already have server-managed metadata. From that perspective, another triple couldn't hurt :-)

For example, I work with FHIR/RDF, which is specified by HL7 and has no place to write down Shape Tree associations.

Hmmm, interesting. I have been to a few talks about FHIR, but I haven't worked with it. It would be interesting to hear if there are any requirements on container data that can come into conflict with the assumption of container data being pretty sparse and mostly server-managed.

Given that, it seems to be a requirement that the server preserve the Shape Tree associations for these entities. Given that the server already has to provide some sort of Link header for the members of a Container, it's simpler to make that uniform across user data and Containers.

Yes, it could go both ways. My main point was really that the sole criterion of whether a container is managed should be that the resource the container links to describes itself as a manager. No HTTP specific things needs to be specified in that process, and really shouldn't.

ericprud commented 3 years ago

Disclaimer: Justin and I are taking over editing from Josh so I at least may misunderstand the spec.

hash vs. slash

I don't think that the hash/slash conflict (war? crusade? feud?) affects the mechanics here.

I agree! It wasn't what I was getting at at all. I think the problem with the current proposal is that it complicates things necessarily, you first need to a do a HEAD, then it specifies a certain HTTP status code, which is something that should be done at the protocol level, not here, and then you'd have to look for a description of a resource which is not a document, and then there's more HTTP stuff. It doesn't roll off the tongue ;-)

Oh yeah? Well it sounds like the chorus to a pop song to me. <smiley elided, but you know i meant it/>

I think there are two issues:

  1. diff between parsing the ShapeTree ptr from a Link: header vs. from the body. I'll try to justify more clearly how we ended up on the Link: header.
  2. explaining that it's a Link header and being slightly illustrative without being overly prescriptive (i.e. mentioning HEAD and thereby implying that a GET won't get you the same info).

So, instead, it should be a criterion for a single triple. Iff that triple is there, then it is a Managed Container. That should suffice, and I intended with my post to propose some ways to do that.

(weak) PROPOSAL: replace </data/CommonNotes/meta/UUID> with </data/CommonNotes/meta/myTreeNode> (suggestions welcome) PROPOSAL:change the existing example to have a child (st:contains) on another site to clarify that it's not bound to one document or one site.

Why should this specification name the URI at all? Surely, it can be discovered from container metadata, whether that is the header or the body?

The above two proposals were editorial; they're only meant to find a way to communicate our Plan A. Though I have the feeling I'm replacing uncertainty with confusion.

Container data vs server-managed metadata

From an information flow perspective, I think you could do it either way. Without looking at the context of the server management of instances, it's probably easier to edit LDP library code than server code, which would mean that adding it to the Container body would make it easier to deploy on existing infrastructure.

Yup.

But (and it's a big but), I'm pretty confident we'd not want to do the same with the contained data. I consider that user data and want nothing to do with it.

It depends on what you mean by "contained data". At the very least, the containment triples are already server-managed metadata, the user isn't allowed to touch that, so we already have server-managed metadata. From that perspective, another triple couldn't hurt :-)

Yeah, I think I meant the other thing by "contained data". I meant some Turtle file that the user stuck in a Container.

In my original js implementation, I took a minimalist approach with no reliance on metadata. I designed something that stuck all of the info in Container bodies, along the lines of your suggestion. For homogeneous Containers, this worked reasonably well, but for a Container that could contain Widgets and Gizmos, it relied on URI templates (e.g. "Widget-{ID}") in the Container ShapeTree to prescribe names that would specify that it followed the rules for a contained Widget (and not for a contained Gadget). Also, you'd have to do an exhaustive search through the Widget's graph to hopefully find exactly one node which matched the shape specified in the Widget ShapeTree. Workable, but kinda crappy and poor performance/error specificity for larges payloads. The only alternative I could see was to write into the Widget instance, which would lead to complex rules around whether shape validation should account for system metadata, whether that stuff should be stripped out on transfer by GET, whether PUT should/could parrot those triples back...

Later, folks at Inrupt started implementing server-side validation and (I believe) started storing ShapeMaps (node/shape pairs) required for re-validation on PUT or PATCH. Associating the node with that shape fixed the performance and error specificity problems, but did so by recording "http://pods.example/kjetil/MyGizmos/1234#it@http://gizmos.io/shapes#Widget" in a metadata resource. Josh re-implemented shapetrees in Java, using the same mechanism to map the Resource back to a Shape Tree and made it available via a Link: header. Given that the navigation from contained Resource to ShapeTree was via a Link: header, it was easier on the user to apply the same approach for the Container. This also ducked questions like "what if a Container contains a Container?"

For example, I work with FHIR/RDF, which is specified by HL7 and has no place to write down Shape Tree associations.

Hmmm, interesting. I have been to a few talks about FHIR, but I haven't worked with it. It would be interesting to hear if there are any requirements on container data that can come into conflict with the assumption of container data being pretty sparse and mostly server-managed.

FHIR specifies the payload of /Observations/1234.ttl but not /Observations/ so it was pretty easy to superimpose on LDP. I think the only place we'll see probs is for some protocol (e.g. FHIR 2099) which has an opinion about what comes back when you GET a directory.

Given that, it seems to be a requirement that the server preserve the Shape Tree associations for these entities. Given that the server already has to provide some sort of Link header for the members of a Container, it's simpler to make that uniform across user data and Containers.

Yes, it could go both ways. My main point was really that the sole criterion of whether a container is managed should be that the resource the container links to describes itself as a manager. No HTTP specific things needs to be specified in that process, and really shouldn't.

I think the HTTP specifics are just the attempts at illustration, but for the above reasons of not screwing with user data, I believe that we do want to use HTTP to associate Resources to ShapeTrees (and Shapes and whatever).

kjetilk commented 3 years ago

OK, I see I opened too many threads with this, which was unintentional. There are a bunch of quite orthogonal issues here, like whether we use a Link header or message body to refer to a ShapeTree metadata resource, that's completely orthogonal to what I really wanted to discuss, which was the criterion to use to say that something is managed.

Anyway, I think that this is better done by me writing a PR for it somewhere down the road when we get to that.

ericprud commented 3 years ago

Fair enough, though I'd like to make sure I communicated the vision sufficiently that we can separate editorial and substantive PR (and you can make editoral PRs to make it easier to argue technical points). Ping me on gitter if you want voice.

kjetilk commented 3 years ago

OK, the Link vs container wasn't what I intended to discuss in this issue, but since that's where it is headed, OK. :-) I can make a PR for what I intended for this issue, but I will probably not be able to engage on that very soon, I just wanted you to have an early view of my thoughts from my first reading.

justinwb commented 3 years ago

I think @kjetilk has a valid point that this specification doesn't have to be so opinionated on whether that managed relationship comes from a link header or from a triple in the body of response. In LDP there is precedence (to @kjetilk's point) on containers including server managed / injected triples in the response body. At a minimum, I believe that we can look at adjusting the text to expand on where this can be sourced from.

ericprud commented 3 years ago

I don't think this particular flexibility will help anyone. It always seems nice to be flexible, but it means everyone has to implement both code paths for every interaction.

kjetilk commented 3 years ago

Sure, but since an important principle of Solid is to have as much intelligence on the client side, it could be argued that if you need to specify certain server behavior, like you have to do with a Link header, then that's an addition of intelligence on the server side.

In this specific case, it doesn't matter that much since validation is important to have both on the client side (for UX) and on the server side (for security) and so there is a requirement to specify server behavior anyway, but that's the guiding principle I argue from.