Clarify the notion and mechanisms for server-managed information

csarven commented 4 years ago

What are the conceptual and implementation differences between server-managed containment triples and the triples in server-managed resources (auxiliary)?

It appears to be that we have two different mechanisms in the spec that are conceptually similar, if not the same. How can we reconcile this?

Containers are self-descriptive in that the containment information is part of the resource description and managed by the server. Contained resources are tied to the lifecycle of the container. Container may include other information provided by clients. Containment triples can't be updated by clients.

Server-managed resources (auxiliary) are independent resources discovered through primary resources and are tied to their lifecycle. Server-managed resources are not writeable by clients.

We can see that although cumbersome, containment information can be found in a server-managed resource (auxiliary). We can also see that any information that can part of a server-managed resource (auxiliary) can instead be in a self-descriptive resource.

justinwb commented 4 years ago

We can see that although cumbersome, containment information can be found in a server-managed resource (auxiliary).

It's true that you could store containment triples in a server managed auxiliary resource associated with a container, but I'm not sure that we have to.

We can also see that any information that can part of a server-managed resource (auxiliary) can instead be in a self-descriptive resource.

It is true that you could store the same information in a regular Solid resource as a server managed auxiliary resource, but the provenance would not be the same. This is the key value in the server managed auxiliary resource type. No agent would have the ability to directly write or modify the data in a server managed resource. A consumer of the data in a server managed resource can have confidence that the data inside it (i.e. timestamps, creator, etc) was written by the server and no-one else.

csarven commented 4 years ago

The point of this issue is to note and justify why we have one mechanism where a server controls certain kind of information and yet another mechanism for other information. It seems to be a bit of an arbitrary split.

but the provenance would not be the same.

Explain.

No agent would have the ability to directly write or modify the data in a server managed resource.

Holds true for containment triples in the container resource that is managed by the server:

Containment triples can't be updated by clients.

Clients affect resources directly and indirectly eg:

Client requests to append/remove a resource to/from a container. Only the server manages the containment relationships.

Client requests to create a resource. Only the server manages the creator relationship.

It is not apparent why a container (or any other resource for that matter) can't be self-describing especially when it comes to information like "timestamps, creator etc." It is already doing that for containments as well as other information. Aside: Some of the Solid servers exposed posix information in the container, and client have nothing to do with it.

If the server-managed auxiliary resource had its own lifecycle ie. potentially outliving the resource it is about, that'd be a clear enough reason to keep the server-managed resource decoupled from the primary resource. It would allow the server-managed resource to be more useful and preserved independently. But that's not what's intended or within the scope of this auxiliary resource. If there is no value to preserving the server-managed information beyond the primary resource's lifecycle, why isn't the information in server-managed resource part of the primary resource - further supporting self-describing documents? I don't think the access modes is significant because the underlying information is deemed to be server protected regardless of where it resides.

It is more intuitive to find something about the resource at the resource than at another resource. Not a whole lot different than finding the Last-Modified HTTP header (which is server-controlled) where one would expect.

Just a thought: server-managed auxiliary resource seems a notch too broad / catch all. Would it help to bring it down a layer and frame it around provenance level information? We don't lose out on timestamps, creators etc. or it being server-managed (ie. read-only for clients). It could be expressed along the lines of activities, entities, agents (eg. PROV-O).

If the information that's intended for server-managed resource is dependent on the primary resource's lifecycle, it'd be relatively simpler for a server to manage that through a single resource instead of two. It also has minimal Web-footprint. Decoupling or perhaps loosely coupling the two resources may be preferable if different lifecycles have value.

jaxoncreed commented 4 years ago

It is more intuitive to find something about the resource at the resource than at another resource.

I disagree with this. It makes more sense to be able to filter out metadata. A "resource" is a thing and only that thing. It shouldn't need to include extra information. I'm in favour of using link headers to discover metadata.

elf-pavlik commented 4 years ago

During the call we also discussed Non-RDF Sources as well as resources with digital signatures. In both cases server can't really add triples to those resources.

csarven commented 4 years ago

@jaxoncreed On the contrary, we are not in disagreement. I didn't claim that resources should include metadata. Containment information and server managed auxiliary information are part of resource description. I consider them as data. There can be separate resources describing provenance information, history/audit, access logs etc.

We reframe what's intended for "server managed" to specific kinds of resources. For example, proposal for clients requesting to create Memento's URI-R and have the server create URI-M, create/update URI-T: https://github.com/solid/specification/issues/61#issuecomment-623386266 . Essentially include header:

PUT https://csarven.ca/linked-research-decentralised-web
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"

201 Created
Location: https://csarven.ca/linked-research-decentralised-web

A client can for example discover a resource's URI-M and URI-T from the link relations:

GET https://csarven.ca/linked-research-decentralised-web
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"
Link: <https://csarven.ca/linked-research-decentralised-web.timemap>; rel="timemap"
Link: <https://csarven.ca/archives/linked-research-decentralised-web/ce36de40-64a7-4d57-a189-f47c364daa74>; rel="memento"

Or discover provenance information through eg. rel prov:has_provenance. Use specific properties to discover audit, access logs. They can all be managed by the server.

@elf-pavlik Makes sense. The context was RDF sources any way. [My point was about server's capability, not whether they can, should, or not for any arbitrary resource.]

justinwb commented 4 years ago

@csarven I'm not sure I have a firm grasp of what you're looking to see changed for server managed auxiliary resources in the current proposal at #156.

From what I can tell, you support the use of link headers to discover server managed data in auxiliary resources (which is described in #156), but want those relations to be more specific, rather than one single catch-all for "server managed" data?

csarven commented 4 years ago

@justinwb That's right. Key points:

Server managed auxiliary resource is too broad (vague) for clients. In theory it can contain any information that the server deems useful, but it needs a specific shared data model for it to be actually useful for clients.
With a single relation for "server managed" information, clients will not know whether the description in the auxiliary resource is actually useful to them until they fetch and inspect.
General rule of thumb: resources should describe themselves. If the most recurring subject of the statements in server managed auxiliary resource use the same identity as in the primary resource, it is better to describe that identity in the primary resource eg. "R creator x" or "R date y" should be in the primary resource. This is the same pattern as the one I've discussed about where to put resource labels - they are not "metadata"!

One way to move forward:

Use specific link relations to indicate the kind of information that's expected at the target resource eg. provenance, audit, access logs - however the kind of information that the server wants to expose need to be sliced. For instance, the notion of provenance information is sufficiently precise and used in the wild. There is a specific relation available that can be loosely coupled with the Provenance Ontology or others. Same goes for Memento where the client can knowingly follow a relation to obtain Mementos, TimeMap and so forth.
The specific relations can still be managed by a server ie. only the server can write/append, as originally intended. But this ought not be the focus. It is just that there is no (strong) use case for an arbitrary client to modify the activities that a server observes - hence, read-only for authorized agents.

If the entities described in the auxiliary resource are significant and should have their own dereferencable identity, that's a clear enough indication to have them in independent resources instead of lumping it under the representation of the primary resource.

justinwb commented 4 years ago

@csarven Thanks for clearing that up, I think we're in agreement, and this also lines up with feedback from members of the data-interopability panel.

kjetilk commented 3 years ago

My feeling here is that the key to resolve this issue is to forward a more general understanding of the role of auxiliary resources than is now in the spec. Do people agree to that?

To do that, I think we need to define dimensions of auxiliary resources, I opened #306 for that, and then define hypermedia-based protocol extension points, as in #270. Does that sound like a plan, or do people think we should take a different angle to resolving this?

kjetilk commented 3 years ago

Since there is silence, in the interest of fast progress towards a Protocol 1.0 release, perhaps we shouldn't take on the full extension mechanism above, but just define the term "server managed" and then add the auxiliary types that are needed right now?

csarven commented 3 years ago

At this time, I'm not sure if there much point in talking about server managed resources if we can't refer to specific types that are part of the Protocol. Above I mentioned things like Memento resources being great candidates for this, which is not even currently required. Perhaps ont:FixedResource. https://github.com/solid/specification/issues/191#issue-674342906 lists some types but nothing in particular that's strictly server managed. I don't see a strong reason to throw in "placeholders" information into the spec -- if there are applications that's working with that, let's see them and document common patterns. We could come back to this issue.

There is one other possibility that's already spec'd that is a good candidate for server managed: as:CollectionPage or ldp:Page as mentioned in https://github.com/solid/specification/issues/230#issuecomment-774791386

kjetilk commented 3 years ago

I think there is plenty of patterns that have already emerged, enough to have a general idea to make an extensible system of auxiliary resources that have many different properties, where server-managed is one of those properties. The question isn't that, the question is if we should allow us that time before having the first version of the protocol specification, and we should possibly not do that.

I think this issue has conflated quite a few issues, we need to drill down to the essence of server managed resources. This is apart from the containment triples, which are indeed server managed, but they are a part of the compound state of the container representation.

A server managed resource is simply a resource that a client cannot be authorized to write to. It may have all kinds of different properties in addition to that, but that is the essence.

If we state as a principle that no state change should be unauthorized, one way to think about this is that the server is also an agent, and has privileges accordingly. Since it is the server, it shouldn't need to authenticate (but it could, as a security-in-depth measure). It may also reject clients use of control privileges, if the client attempts to gain write permission to a server-managed resource.

With that, we can define an auxiliary resource for server managed metadata about a container's children, that is tied to the container's lifecycle, has its own access control but where only the server has write privilege, and rejects clients attempts to gain write privilege by responding 403 to such requests, even in the case where the client has Control privilege. That should resolve this issue and #227 .

Then, we could also define an audit log, which is an auxiliary resource which is not tied to the resource's lifecycle, where the server has append, but not write, and rejects clients attempt to gain write privilege.

kjetilk commented 2 years ago

Any findings, @justinwb ? We can bump it from the milestone, right?

justinwb commented 2 years ago

My feeling here is that the key to resolve this issue is to forward a more general understanding of the role of auxiliary resources than is now in the spec. Do people agree to that?

Partially, though I think it's more specifically in regards to how server-managed information is handled in regular and/or auxiliary resources.

I think this issue has conflated quite a few issues

Agree!

we need to drill down to the essence of server-managed resources. This is apart from the containment triples, which are indeed server-managed, but they are a part of the compound state of the container representation.

I think we have to start by zeroing in more on server-managed data, because (like containment triples) it may not necessary only live in an auxiliary resource. I'm not convinced that we can be constructive when talking about server-managed data in general. I think we need to get more specific and deal with each type of server-managed data in context, because we may treat different kinds of server-managed data differently.

Any findings, @justinwb ? We can bump it from the milestone, right?

Yes - I think that this can be bumped from the milestone. I'm not sure that this is directly actionable, so I don't know that it should be moved to a later milestone. I think that it touches on actions that we need to undertake specifically related to server-managed metadata, but we may be better-served creating specific tickets for each class of metadata that needs specification.

solid / specification

Clarify the notion and mechanisms for server-managed information #177