solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
486 stars 46 forks source link

Alternative solutions to container HATEOAS #525

Open CxRes opened 1 year ago

CxRes commented 1 year ago

I was recently trying to figure out how to store arbitrary data in a Solid-Pod (see my comments on Gitter over the past month on the specification channel) such that my app can follow its way around looking at the data starting from some root container (i.e. preserve HATEOAS). I was shocked and concerned to see that with the Solid specification, as it is defined currently, this is not possible. See #69 #134.

  1. Not having containers with support for arbitrary hypermedia is a DEAL BREAKER. Without a fix, I will even be forced to reconsider using Solid.
  2. I HATE the idea of managing the state of a resource using different mechanism for different parts of the state (#198). The resource is not in the state one PUT's on the server, but in some compound state that is a combines the result of PUT and POST (slug) on the resource. I am forced to completely rely on the server to ensure that I have full knowledge of the resource state. This is just unnecessary complexity both for the server and the client. EDIT (22-05-23): It has come to my attention that this problem has been discussed in other places as well, e.g. #solid/data-interoperability-panel/issues/225

However, I want to see this issue resolved and off the top of my head I can come up with 6 different solutions (presented in the descending order of controversy it might likely evoke):

Solution 1

Automatically also create and destroy a resource with the URI sans the trailing slash, when a container is created and destroyed, respectively.

Pros:

Cons:

Passive Solution 1

Allow clients to manually create a resource with a URI sans the trailing slash alongside a container. The server will add a link relation to the container (even if the container is created later).

Pros:

Cons:

Solution 2

Reserve routes ending with keywords like index.?* to be interpreted specially. index.?* can be used to store state and if one exists, link relations for the container can point to it.

Pros:

Cons:

Aggressive Solution 2

Depending on the type of resource at index.?* automatically redirect to it. Link relations point to where you can get container's children sent with index.?*

Pros:

Cons:

Solution 3

Allow any arbitrary resource to be PUT on the container itself. Add a link relation (or another mechanism) to point a route, say, example.org/parent/$ or example.org/parent/.container, that serves container's children.

Pros:

Cons:

Solution 4

Define with Prefer header [RFC7230] a new parameter, say, container-content which has the following values:

q values can be used to indicate preference, like Accept header.

When no container-content on Prefer header is specified, the resource can revert to default behaviour. The description resource can advertise the container-content Prefer header, with default and possible values on the resource. This default behaviour can be set by, say, writing to the description resource.

Pros:

Cons:

Solution 5

Return a multipart response. Send a response with the header Content-type=multipart/related [RFC2387], [RFC1341], to send container and child states in two parts. Use the type sub-parameter to define the content type for children RDF. Preserve existing ways to write data.

Pros:

Cons:

Solution 6

Replace creation using POST slug with PATCH RDF. If the RDF sent to a resource (PUT or PATCH) contains an appropriate triple, the server also creates a container.

EDIT (22-05-23): This solution is cleanest from an architectural perspective, because there is one container representation, which is what the client acts on to manage everything (writing arbitrary RDF and managing containers).

Pros:

Cons:

Passive Solution 6

Segregated container and resource but using RDF rather than slug to create containers (this is generally more consistent anyway, since children URI are not related to adults URI in HTTP conventions, it is a rule Solid adds. It gives us the possibility of adding RDF related to the container alongside and not outside it in the resource state).

Pros:

Cons:


These suggestions are meant to restart the discussions which has hit an impasse. I hope people use it as inspiration for ideation and not just pick a side to fight over it till exhaustion. Or try to explain to me why I am being stupid!

With folded hands, I beg the Solid community to please deliberate on this with urgency. A platform that aims to build a better web cannot afford to be missing one of its core features. 4 years is just way too long for something as basic as this to remain unresolved.

PS: If you point modifications out, I will edit this post so that everyone can see all the options upfront, rather than having to scroll through an endless stream of discussions.

elf-pavlik commented 1 year ago

We discussed it shortly during the interop panel meeting today. I would also like to propose following it up during CG call on Wednesday.

I would prefer if we focus on the description of the container with emphasis on client-managed and server-managed descriptions. Repeating one prior issue and adding another references:

Once we arrive at a rough consensus, if it goes in the direction of having distinct resources which need to be discoverable, we can dive into the details of using link relations or some URI templates/conventions.

When it comes to auxiliary resources, I think we also can handle the separate topic of servers using IRIs for them which based on current / semantics could be considered as being contained but are not listed as ldp:contains since they are auxiliary. As well as any recommendation for servers to allocate IRIs for auxiliary resources in a way that will not collide with a client trying to create resources via PUT. While I think those topics should be addressed it might be beneficial to divide the problem into smaller parts.

Keeping all above in mind, I assume that there is a rough consensus on two requirements:

I think there are a couple of possible approaches, they mainly differ in combining vs separating client-asserted and server-asserted statements. The main difference seems to be how it impacts the read vs. write complexity. Combining seems to simplify reading but adds a lot of complexity to writing. Separating seems to slightly add complexity to reading while allowing much simpler writing.

I just wanted to add that there are some in-between approaches, for example starting with separated client-asserted and server-asserted statements, while providing a quad-based representation (GET only) that provides both in separate graphs in a single response.

@CxRes do you think that the direction I propose above could be a constructive way to approach the problem in a way that allows us to break it up into a set of smaller problems?

CxRes commented 1 year ago

Having mulled over our chat even more, let me try and step back even more to sharpen my thoughts:

The following URIs according to the web standards are independent and unrelated resources:

  1. <example.org/foo>
  2. <example.org/foo/>
  3. <example.org/foo/bar> There is an exception for example.org == example.org/ but I am ignoring it temporarily.

It is the Solid specification that maps a hierarchical relationship (or at least a notion of container) on them (I acknowledge that similar and related conventions exist in the wild before Solid). The complexity of state management for any resource that is also ldp:Container emerges from that additional constraint of containership imposed by Solid. We, for example, arbitrarily assign slash semantics to manage this additional constraint. This constraint also affects the auxiliary resources (both security and description resources).

While I would generally favour focusing on the smallest possible issue that can resolve an issue, I fear (as I expressed in the meeting) that focusing on just server/client managed representations may lead us to ignore the bigger picture and/or create sub-optimal solutions. We need to take all the consequences of hierarchical relations and all the mechanisms needed to deal with that into account. So to your question @elf-pavlik, to my mind the question would be, how would you even implement containers within the bounds of REST and RDF down to every last detail. IMHO Solid spec does not do that yet, I know that is a can of worms, but if I am to take a 20-30 year view, that is the question to address.

(@elf-pavlik I know I am being vague with a Yes/No question; I feel that taking a "systems" view is a better answer. To the extent that your proposal helps fill in the details of the working of containers on the pod storage system, it is a yes and to the extent it does not, it is a no).

elf-pavlik commented 1 year ago

Linked Data Platform Containers specify that server is responsible for managing ldp:contains statements.

5.2.3.2 When a successful HTTP POST request to a LDPC results in the creation of a LDPR, a containment triple MUST be added to the state of the LDPC whose subject is the LDPC URI, whose predicate is ldp:contains and whose object is the URI for the newly created document (LDPR). Other triples may be added as well. The newly created LDPR appears as a contained resource of the LDPC until the newly created document is deleted or removed by other methods.

If at some point Solid supports other storage types #377 which don't impose / semantics the containment statements can still work the same way. LDP doesn't have any / semantics.

Not having containers with support for arbitrary hypermedia is a DEAL BREAKER. Without a fix, I will even be forced to reconsider using Solid.

I understand that this is the main problem that you want to address, for me it sounds like client-asserted statements describing the container. Issue #96 by itself has 94 comments, #198 has 45 and #227 has 134. I'm honestly worried that if we don't narrow down the problem which we want to address in this issue we may just add another perma-thread issue to the collection.

I could try to write down issues I see with how currently CSS implements it, based on my experience of implementing a client which manages client-asserted statement describing container and works with CSS. This would give us a clear starting point grounded in an existing implementation. From there we could dive into which parts are based on requirements from the spec and which are an implementation choice. From there possible improvements to the specification could be evaluated with consideration of the impact on CSS and other existing implementations.

CxRes commented 1 year ago

(@elf-pavlik You can minimize this if you like as this is not central to the issue but about process)

I'm honestly worried that if we don't narrow down the problem which we want to address in this issue we may just add another perma-thread issue to the collection.

  1. The problem is pretty specific IMHO, every resource needs to specify its hypermedia (using Fielding's definition) completely. Clients with write access should be able to specify all the hypermedia for a resource. The way clients create hypermedia should be consistent and should not ideally depend on resource type. The hypermedia that one client uploads (or creates another way) should be what other clients download. Data is the API. There should be no magic, i.e. no amalgamated states.
  2. That this requirement is fundamental to the web cannot be overstated. That this came up as an issue upon implementation is itself a problem. This is why everyone needs to adopt a systems engineering approach, where you document how and why simultaneous to the what.
  3. GitHub Issues are good for small bug fixes, it is not the place to resolve fundamental issues. Neither are one hour meetings sufficient. There has to be a more structured discussion (and changes to the format of how specs are being made) followed by time to reflect. I have already spoken about it elsewhere, but I don't know whom to approach to be taken seriously.
  4. I want to take the opposite approach to you. Take it as my bias as a scientist that prioritizes the theory/design over what I claim is your bias as an engineer (I was one too in a past life) to experiment/implementation. I am worried about missing out on solutions that cannot be arrived at with incremental changes to the CSS implementation, there has to be some breaking change.
  5. Having said that, let me not discourage you from your approach. If there is an incremental solution, then your way is faster.
CxRes commented 1 year ago

I had read the LDP spec many months ago, and I re-read it today. While container triples are server managed, it puts no restriction on client creating resources including containers with custom RDF See 5.3.3.4 Bullet 2 and 6.3.2. Similarly, no restrictions on writing triples to existing containers afaict (other than overwriting container triples).

bblfish commented 1 year ago

@CxRes wrote:

Not having containers with support for arbitrary hypermedia is a DEAL BREAKER. Without a fix, I will even be forced to reconsider using Solid.

All the containers in Reactive-Solid can accept any mime-type https://github.com/co-operating-systems/Reactive-SoLiD Solid should be an extension of the web not a restriction to it.

CxRes commented 1 year ago

Solid should be an extension of the web not a restriction to it.

@bblfish Exactly! And that has to be well-defined in Solid to the quality expected of a specification document.

I was not aware of your implementation. The issue at hand is not the ability for the server to accept a resource, but what should be sent back to the client when it requests the container. What, for example, is the response to a GET to a container resource in the case it has both a client state i.e. PUT/PATCH on it as well as contained resources i.e. POST with Slug header in Reactive-SoLiD? The Solid spec (nor LDP spec) does not specify this! Hence, container part of #198.

Also, say, can you PUT a non-RDF resource on the container? Should this be restricted? How is the GET handled then?

These cases are not precisely dealt with in the Solid specification, as evidenced by all the above-mentioned open issues and endless stream of comments. As I have been saying and you also rightly observe, this is a fundamental web-architecture issue (one that should have been addressed when Solid decided to use LDPC).

In your capacity as an implementer @bblfish we could really use your help in resolving this!

TallTed commented 1 year ago

@CxRes — I believe Solid (not LDP, and certainly not HTTP) specifies that if there exists a container /x/, attempts to create a document /x MUST fail, and likewise if there exists a document /x, attempts to create a container /x/ MUST fail. In other words, /x/ and /x cannot coexist on the same Solid server. These restrictions are specific to Solid servers, and do not impact Apache, Nginx, nor other HTTP or LDP servers.

The above restrictions also do not have anything to do with the content of documents stored on a Solid server, which should accept resources of any media type for storage, whether those resources contain pure RDF data, RDF with other data, or entirely non-RDF data.

CxRes commented 1 year ago

@TallTed I am aware of the slash semantics adopted by Solid and have been speaking about it with every stakeholder when I get a chance. I can safely say that pretty much everyone I have spoken to in private dislikes it (Please share your opinion with me, especially if you think otherwise). Why the eventual choice was made is also not clearly documented afaik. That some of my proposed solutions here choose to violate it, is just a design choice on my part, in search of the best possible solution without prejudice to the existing spec (since the spec is not 1.0 yet).

Solid however does (effectively, not explicitly) restrict what one might PUT in the container by not clearly specifying what should be returned in a GET thereafter. To the extant it does specify or is implemented by various parties, the handling is convoluted and unintuitive. All of this is well documented in the cited issues.

elf-pavlik commented 1 year ago

I think we should take @CxRes suggestion, which he made a couple of times during Solid Notifications Panel meetings, and have a working meeting fully dedicated to this issue. IMO we should find and reserve a 2-hour timeslot for the first meeting and see how much progress we will be able to make this way.

During one of the recent calls with low participation @CxRes and I took the opportunity to have an initial discussion this is my understanding of the approach which we find preferable:

I think one of the use cases we will need to discuss is the intended use of HTML, which seems to be a special case. I believe that supporting Non-RDFSources as a container can help with that. By HTML I mean any of below:

Hopefully, with a dedicated meeting, we can efficiently start triaging any previously discussed concerns and move forward without creating another perma-thread, which honestly is my biggest concern here.

pchampin commented 1 year ago

@CxRes I have not been following closely the discusssions on gitter or on the other issues you link to. For the sake of making this discussion a little more self-sufficient, could you explain your problem through one (or a few) step-by-step description of what you would like to do, and where the Solid spec (or current implementations) fail to deliver? This would be really helpful.

Also, you mention here Section 5.3.3.4 of LDP, I assume you mean 5.2.3.4.

elf-pavlik commented 1 year ago

I recall Linked Data Platform 1.0 - 5. Linked Data Platform Paging Clients mentioned during today's call. It could be possibly adopted as is or at least serve as stable reference to compare other possible approaches with.

Prefer: return=representation; include="http://www.w3.org/ns/ldp#PreferMinimalContainer"

https://www.w3.org/TR/ldp/#dfn-minimal-container-triples

The portion of an LDPC's triples that would be present when the container is empty. Currently, this definition is equivalent to all the LDPC's triples minus its containment triples, and minus its membership triples (if either are considered part of its state), but if future versions of LDP define additional classes of triples then this definition would expand to subtract out those classes as well.

This seems pretty close to the client-managed statements, currently everything except ldp:contains and Contained Resource Metadata. I think supporting that with the current state of the Solid Protocol draft would at least allow clients to PUT the container description without the need to filter out the server-managed statements.

It gets a little more complicated when we get to Linked Data Platform Paging 1.0 (only a WG NOTE).

Linked Data Platform Paging 1.0 - 5.2 Client preferences defines:

hint description
max-triple-count The maximum decimal number of triples the client wishes to appear on each page.
max-kbyte-count The maximum decimal number of kilobytes (1024 byte units) the client wishes to receive as the page's representation.
max-member-count The maximum decimal number of members the client wishes to appear on each page. This parameter is only meaningful for paged LDPCs.

If someone only wants to page the ldp:contains statements probably could combine hints

Prefer: return=representation; include="http://www.w3.org/ns/ldp#PreferContainment"; max-member-count="100"

While the above seems to be available approaches for requesting specific parts of the container description. I think there are a few other aspects that we should take into account, especially authorization. For example, we have discussed the requirement of acl:Write on the container when only adding or deleting container resources should be permitted:

Separating client and server-managed statements into distinct resources would allow for preserving resource-level access control. Clients never can change server-managed resources and clients can fully change client-managed resources (given it passes constraints like shape validation). It also allows simply allowing creation and deletion of contained resources without allowing updates to the client-managed description itself.

elf-pavlik commented 1 year ago

Capturing feedback from CSS team shared on: https://app.gitter.im/#/room/#CommunitySolidServer_community:gitter.im

One of the core issues was how to interpret a container description resource vs the representation of a container. We decided to just return all of the description resource contents as part of the container representation as it seemed to make sense that when someone does a GET on a container, they want to have its descriptive information

we then decided to also only allow changing that data through the description resource, so the container itself could be seen more as just "a container of resources" and not something that has data itself, besides what is in its description resource

the reason we only allow PATCH and not PUT is because the description resource contains server-managed triples, and a PUT seems weird then as in general PUT means "put an exact copy of this data at that location", which doesn't work if the server then adds triples to it

but all of this is just what seemed to make sense to us at that time, everything can change if it turns out other solutions are preferred. E.g., allowing PUT on descriptions resources, or splitting the "data" of a container and its description resource, etc.

For the record, there has been some prior conversation in:

elf-pavlik commented 1 year ago

Another issue that can surface in certain scenarios, with the current approach of mixed client & server statements.

  1. Client always uses If-Match with ETag when updating the container (client-managed statements)
  2. Container represents an active chat room, where contained resources are added every second, which leads to ETag changing frequently.
  3. Client ends up with a very small time window to update resources before the ETag used in If-Match changes. It will most likely force it to implement automated conflict resolution, which checks if any statements it tries to change have changed before auto-retrying with newer ETag

Having client-managed statements separate eliminates issues caused by changing ETag with frequent updates to the server-managed statements (especially containment triples).

EDIT: ETag will be also changing if any contained resource changes since the container description includes server-managed dc:modified statements.

elf-pavlik commented 1 year ago

We had a good conversation during today's call.

Next week I would like to discuss the access control aspects for both keeping the server and client managed together or separately. I'm going to post some notes here beforehand to serve as a reference.

We also discussed the index.ttl as implemented by NSS, it would be great if someone familiar with it could compare it to `rel="describedby' used by CSS. I would expect that it works very similarly with the main difference being URL pattern vs. Link Relation for getting to the client-managed resource. Both seem consistent in having a dedicated resource for client-managed statements.

woutermont commented 1 year ago

As agreed on in this week's CG meeting, we should have a dedicated call to discuss this. Until then, I'll just add my thoughts here for reference.

Reading the related discussions, I believe the problem with containers follows from a number of strong intuitions. I suggest that shifting (at least) one of those intuitions makes the problem go away.

As an aside: in all cases, a PUT request with an empty body could be interpreted as deleting the client-managed part, and reinstating default serving of server-managed triples.


Re ETags, I don't see much of a problem. They are representation-bound, and so should be generated separately for the client-managed representation(s) and the server-managed ones. Frequent updates to the server-managed part of the compoung state thus have no impact on the Etag of the client-managed representation.


Re the mutual exclusion of url and url/, I do not have a strong opinion. Pure theoretically they are unrelated resources, but the exclusion prevents easy mistakes; besides, it is not clear to me what their relation would/should be when the exclusion is lifted. In any case, this is orthogonal to the interaction issue at hand here.

elf-pavlik commented 1 year ago

I see @csarven proposed Aug 8th. To maximize participation I created a date&time selection pool for the 2nd week of August: https://doodle.com/meeting/participate/id/dyX50YVb

woutermont commented 1 year ago

There seem to be very few participants in the doodle. If we want to have a 2h slot this week, tomorrow Tue 8th 1pm-3pm UTC seems to work best. @CxRes, can you make it? @bblfish @pchampin @csarven

If not, please fill in the doodle to have a better view on meeting slots for Thu/Fri, or whether we should postpone.

CxRes commented 1 year ago

I'll be there (baring any emergency)!

TallTed commented 1 year ago

@woutermont -- The doodle shows "meeting time has been chosen", so neither new responses nor adjustments of previous responses are possible.

All -- Here's a link to the World Clock for the chosen time.

(Maybe this could be added to the CG calendar?)

elf-pavlik commented 1 year ago

We can scribble various details on https://hackmd.io/@solid/rkhahTWd3 Afterward, final outcomes of the meeting should be captured directly in this issue.

damooo commented 1 year ago

Manas currently does the following:

This allows container HATEOS, and single obvious resource as target.

I my self prefer having container-index as an auxiliary resource of container, and container being any rdf-source/non-rdf-source resource fully controllable by user.