Open kjetilk opened 3 years ago
There are a few tests for this amongst the converted tests, and I have reviewed the results of these tests for NSS and CSS.
Generally, CSS seems to agree with these tests, but uses 400
rather than 409
to indicate a failure.
CSS fails in a few corner cases that are certainly arguable, notably it accepts the case where:
Content-Type: text/turtle; charset=UTF-8
Link: <http://www.w3.org/ns/ldp#NonRDFSource>; rel="type"
but that seems pretty meaningless to accept Turtle as a NonRDFSource.
NSS fails a lot more tests, in general it seems to default to disregard the LDP Interaction Models. It will for example allow the creation of a non-container resource even if the Link
header declares it to be a basic container. In other cases, it will create a container if it ends with a slash but is declared to be a NonRDFResource, and give it a text/plain
media type. It will also allow a client to change interaction models.
Even though LDP isn't a central part of Solid, there is some use of such data and such inconsistency can create significant confusion in user expectation that can lead to attacks.
In the LDP sense, any Turtle document that includes comments or human-important whitespace is a NonRDFSource
, because, even though it does include RDF as at least part of its contents, its state is not 100% represented by the RDF it contains ... and there's no way to know by examining the Turtle document's metadata whether it contains comments or human-important whitespace ... so I believe CSS is doing the right thing in your cited corner case.
OK, that's an interesting rabbit hole, because there are hardly any RDF serializations where you can distinguish between different interaction models without doing an examination of the representation. Perhaps NSS behavior of ignoring them entirely are actually quite appropriate?
I think two more things should be considered:
Here I've listed what I believe to be combinations of parameters that result in failed requests.
Empty cells denote parameters irrelevant for said case.
For example read row 1 as "If there is an existing container at the URI of a PUT request with an ldp:NonRDFSource
link header then fail" (because cannot change the type of an existing resource).
# | Existing resource type | Method | URI | Link | MIME | Body |
---|---|---|---|---|---|---|
1 | Container | PUT | NonRDF | |||
2 | Container | PUT | NonRDF | |||
3 | Container | PUT | NonRDF | |||
4 | NonRDF | PUT | Container | |||
5 | NonRDF | PUT | RDF | |||
6 | NonRDF | PUT | RDF | |||
7 | RDF | PUT | Container | |||
8 | RDF | PUT | NonRDF | |||
9 | RDF | PUT | NonRDF | |||
10 | RDF | PUT | NonRDF | |||
11 | POST | Resource | ||||
12 | PUT | Resource | Container | |||
13 | PUT | Container | NonRDF | |||
14 | PUT | Container | NonRDF | |||
15 | PUT | Container | NonRDF | |||
16 | Container | NonRDF | ||||
17 | Container | NonRDF | ||||
18 | NonRDF | RDF | ||||
19 | RDF | NonRDF | ||||
20 | RDF | NonRDF | ||||
21 | RDF | NonRDF |
There is certainly some tension here, because it could be argued that "any representation goes", that is, you could have an image with nodes and arrows and say that it represents the same data as some RDF. With such an interpretation, interaction models wouldn't even make sense. And yet, there are cases where such metadata is useful.
Let me reiterate that my agenda here is to ensure that users aren't confused by apps that are making assumptions based on a too liberal interpretation of metadata like ldp:RDFSource
. Being quite restrictive now will pay off in the long run, I think.
@langsamu thanks a lot! I agree with all of these, possibly save for one. A few of them are in my tests too. I purposely left the body out, because that opens another can of worms, since the data may also be a factor there as I argued in https://github.com/solid/data-interoperability-panel/issues/154 . I feel that in the first iteration, it makes sense to only make restrictions based on message headers and request URI, but I think we should return to validating the body with some urgency.
The one case where we may differ is row 11. My main missing feature in Solid is POST
-to-append behavior, see #305. Usually, you'd do a PUT
to the resource or POST
to its container to create a resource, and then when it exists, a POST
to append, but there's nothing in RFC7231 that prevents POST
to create the target resource either, so I don't think that should be a failure.
@RubenVerborgh :
Earlier, I advocated a simple
400
, but then, the use of the term inconsistency in HTTP, as quoted above, has made me think that409
is appropriate.I disagree. Status codes generally give an indication of what kind of solution is required.
For 409, we read:
The 409 (Conflict) status code indicates that the request could not be completed due to a conflict with the current state of the target resource.
—https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.8
That's not the case: the conflict is independent of current resource state, and happens because of the request itself.
Yeah, from that definition, I agree it seems like stretching it a bit. However, my instinct is to always look for more specific codes than the x00-codes, as that may come in useful for the client in resolving the problem. But I agree it is pointless to stretch the definition of an error code if it does not actually help with that.
Now, I'll cite the things I think are most relevant from RFC7231 that I think pertains to this, it is an excerpt from the first comment, so you can see if I take it too much out of context:
An origin server SHOULD verify that the PUT representation is consistent with any constraints the server has for the target resource that cannot or will not be changed by the PUT. [...] When a PUT representation is inconsistent with the target resource, the origin server SHOULD [...] respond with an appropriate error message containing sufficient information to explain why the representation is unsuitable. The 409 (Conflict) or 415 (Unsupported Media Type) status codes are suggested
I'll argue that this is (mostly) the case. Such a constraint is that a container has a URI that ends with a /
, so any representation that is inconsistent with this (and you could split hairs over headers, but eventually that is also manifested in the body), falls under this formulation.
I would also argue that there is a current state of the target resource, even if it doesn't yet exist, for example the state of the resource whose URI ends with /
is that it is a container... But yeah, the definition of 409
does not capture all cases where the rest of the RFC suggests its use, but that suggestion is clearly there.
A 400
will certainly not give a client enough information to resolve the issue, that error would probably end up with the end user. With a 409
, the client may examine the headers it sent and try something different, or we could include some RDF in the body that points to the right URI.
[@RubenVerborgh] I think RDF graph equality is enough.
Any interaction that may result in loss of data which the humans (who really are the focus of all interactions, even when they aren't involved except at the fringes) might consider important, should be avoided.
RDF graph equality is NOT enough with data files which include data which is not RDF encoded, including syntactically irrelevant comments and whitespace, such as Turtle or RDFa may contain. RDF graph equality when comparing Turtle to RDFa to N-triples to N-quads does not preserve all data therein, and this should not be acceptable.
I am rather weary of explaining this over and over, and hearing back, "we don't care about your idea of what data is." That is a path certain to kill Solid, quite possibly quite messily, though it may be a slow death.
The LDP WG made specific decisions, and wrote them up in a way that we thought at the time would be clear to readers, especially implementers but also including users. IF Solid is to continue to claim LDP compliance at all, and UNLESSS Solid writes up and warns of the significant data-losing departures from that compliance, with those warnings being presented to the user at every point they arise, then non-RDF contents of Turtle and RDFa files MUST be preserved and these files MUST be treated as ldp:NonRDFSource
.
IF Solid is to continue to claim LDP compliance at all,
Currently, Solid does not claim LDP compliance in this regard, it only references LDP normatively for Basic Containers and Accept-Post
. I think we carefully need to write up text around graph equality and how that relates to the representations of a resource, but that's not a part of this issue.
A
400
will certainly not give a client enough information to resolve the issue, that error would probably end up with the end user.Yes; we might want to talk about error bodies at some point. (CSS offers structured data.)
Great stuff! I think that would be great to fast track for 1.0.
With a
409
, the client may examine the headers it sent and try something differentSame with
400
.
Question is whether 400
would be too generic to trigger that in a client.
But my instinct as a client, upon a
409
, would be goGET
/HEAD
the resource in question to find what state it is in, as a409
hints at an inconsistency with what exists and what I am requesting, not an inconsistency in my own request.400
would.
Right. The problem is that HTTP isn't entirely consistent at this point...
I don't have particularly strong feelings here. In fact, I think I would have preferred a 3xx error saying "this is not the resource you're looking for, the server's guess is that you're looking for an entirely different resource over there" :-)
LDP describes server behaviour when a client includes the type
link parameter in the header of a POST
request targeting a container. The behaviour may be generalised or interpreted for PUT
and PATCH
however it is neither described by the LDP spec or in its test suite.
Currently, the SP only describes server behaviour when a client includes type
link parameter in the header of a POST
request targeting a container to allow clients to create a container where the server assigns a URI. If the SP were to remove container creation with POST
, then there would be no need for a Solid client to use the link parameter in the request. (Alternatively, an equivalent request to create a container with POST
can be achieved by requiring the Slug
header with a slugtext
value trailing a slash (/
).)
The type
link parameter "does not override the Content-Type header field" (RFC 8288) and so it'd be useful to understand whether anything significant can be used towards consistency when the request-target
and Content-Type
(RFC 7230, RFC 7231) is already available. With concrete RDF syntaxes and their corresponding media-types, consistency can be determined with higher specificity using Content-Type
than the abstract semantic type (RFC 6903) in the Link
header.
I suggest that SP's approach for consistency should not be motivated through the lens of LDP. Consistency in SP should rather reflect the normative requirements, e.g., if an abstract semantic type is of interest to consider as one dimension of the request semantics, then we first need to establish the required type
s in SP. We need to resolve issue: https://github.com/solid/specification/issues/191 .
It depends on the role of Link
header - with type
link relation, for instance - in a request and response; processing, and the expected representation metadata and data. When a specific requirement is not provided, the simplest (and possibly safest) behaviour for a server is to ignore the request's representation metadata when it encounters them. Returning a 4xx is sensible when it corresponds to normative requirements that are not fulfilled or conflicting.
Consistency between request-target
and Content-Type
is "simple". When the effective request URI's path component ends with a slash (/
) targeting a resource without a current representation and the Content-Type
of the request includes the media-type of a concrete RDF syntax (RDF 1.1 Concepts and Abstract Syntax), the server MUST create a container.
I suggest that we either put more emphasis on requiring specific type
link relations or clarify the parts that currently relate to it in SP. That's at least one aspect of consistency we can clarify.
The tables in this issue are interesting from the point of implementing a server that tries to conform to the SP and LDP. However, that's advisory, not requirement.
With that understanding, I'm in favour of factoring it under a non-normative "Relation to LDP" section in SP (or elsewhere). See https://github.com/solid/specification/issues/224 . (Along with https://github.com/solid/specification/issues/194#issuecomment-694828342 which describes compatibility between LDP clients and Solid servers, and LDP servers and Solid clients.)
Currently, the SP is not requiring RDF validation. As servers MUST HTTP/1.1 (RFC 7230, RFC 7231), they can return 4xx as usual when they process request's representation data. When processed, the IANA media-type used in representation metadata is integral for validation, as opposed to the abstract semantic type.
@csarven
I suggest that SP's approach for consistency should not be motivated through the lens of LDP. Consistency in SP should rather reflect the normative requirements, e.g., if an abstract semantic type is of interest to consider as one dimension of the request semantics, then we first need to establish the required
type
s in SP. We need to resolve issue: #191 .
They may be motivated through the lens of LDP, but I don't think that's where the disagreement is, it is rather a symptom of that we view this differently. I feel you're making this too much of theoretical exercise, @csarven . It shouldn't be. I am trying prevent a likely and relatively easily foreseeable class of non-technical security problems. That's the point, not LDP.
Also, importantly, there is an open world, so people may introduce new types of resources, possibly not telling us. Therefore, I think it is very important that the protocol contains generic language that legitimizes tests that fails when there is an understood inconsistency, as well as prompting people to think about it.
So, exhaustively finding and documenting inconsistencies cannot happen, thus it should not depend on #191 , rather a part of solving #191 should be to document new possible conflicts. Generic language still needs to be in for the open world, and for reducing the risk of vulnerabilities.
so, jumping to
With that understanding, I'm in favour of factoring it under a non-normative "Relation to LDP" section in SP (or elsewhere).
It would be nice to have it in a "Relation to LDP" section in addition, but that won't allow testing it rigorously, and so, does not address the actual problem, prevention of a class of possible attacks. There needs to be normative general language in the spec that encompasses the possible inconsistencies so that tests can check that implementations aren't doing anything dangerous. Then, we can have non-normative language that provides guidance, but that's of less value than tests anyway, so we don't need that.
It depends on the role of
Link
header - withtype
link relation, for instance - in a request and response; processing, and the expected representation metadata and data. When a specific requirement is not provided, the simplest (and possibly safest) behaviour for a server is to ignore the request's representation metadata when it encounters them. Returning a 4xx is sensible when it corresponds to normative requirements that are not fulfilled or conflicting.
I interprete this in two ways, one is a desire to close the world. I can see why, but I don't think it is necessary or the right thing to do.
The other is the old debate on how lenient you should be in error handling. That's not an easy debate. To me, if something is just clearly an inconsequential mistake, then yes, I can easily advocate for an ignore approach. If something is clearly an attempted attack, then, it really has to be an error. The line between the two is a bit blurry, but I think in this case, it is clear enough: If something sets a header despite the very clear semantics we have around containers and non-containers, then it is most likely an attempt at doing something nasty. It should be an error. This is unlikely to happen as a simple mistake, this is not constraining the good path in any way, it should not ever be visible when running legitimate software. This is a line of defense against attacks exploiting semantic inconsistencies.
The response payload can provide the reason for the client error. Issue: https://github.com/solid/specification/issues/28 using RDF messages.
URI Slash Semantics (SP).
When the request's Content-Type
is inconsistent with the path component of the effective request URI, the server MUST respond with 415 (RFC 7231).
URI Persistence (SP).
When the request's Content-Type
is inconsistent with the current state of the target resource, the server MUST respond with 409 (RFC 7231).
When the request's Content-Type
is inconsistent with the current state of the target resource, the server MUST respond with 200 only when committing to preserve the identity of the target resource (RFC 7231).
When the encapsulating container of the HTTP message is invalid, the server MUST responds with 400 (RFC 7231).
The consistency between request target, representation metadata and data is categorically different than "bad" requests.
I feel you're making this too much of theoretical exercise
I suggested that for consistency to be useful in the SP, it needs to work with normative requirements. If that's not prescribed in the SP, the existing RFCs already cover the general requirements in addition to good practices.
OK, lets bump it off the milestone. This problem arose as extensive testing shows that the implementations did not agree the slightest around what the actual behavior should be. If it is then adequately covered by the RFCs, I think we should clarify it, but we can also just go with it and hope no-one finds a way to exploit it in the wild.
Can you refer to the tests corresponding to the requirements in the SP?
We're working on converting tests for the new harness, and without going into too much detail on that work-in-progress, we have noted that there is a lot of divergence on how the implementations deal with the tension between slash semantics and LDP Interaction Models. I think this means that the heuristics from #128 is not sufficiently clear. My main fear is that since there is quite a lot of assumptions connected to containers and resources in the access control, lack of clarity in this area could result in opening attack vectors that we can't imagine right now. So, the intention isn't to change the good path (i.e. these are errors an end-user should never see), but to define error behaviour in the other cases, e.g. when something is declared a Non-RDF resource but doesn't have a
/
.In https://github.com/solid/specification/issues/40#issuecomment-566995240 I tried to introduce the two concepts of consistency and exactness. I still think they are useful and that we need them.
By consistency, I mean that different parts of the request carries the same semantics. If something has a
Link
header that says something different from the request URI, there are a conflicting statements and so, there is an error. There are several sources of information that could cause these conflicts, but to advance this issue, I think we should consider three, namely the possible conflict between LDP Interaction Models as given by aLink
header, theContent-Type
header that can distinguish between RDF Sources and Non-RDF Sources and the request URI, which can determine if the resource is a container.As previously mentioned, by exactness I mean to constrain the freedom the server has to adapt the request. Exactness is a criterion that applies differently to
POST
than to other HTTP verbs. WithPUT
request, the HTTP spec affords very little freedom, from RFC7231:So, it is not appropriate to change a request URI to fit a different interaction model when
PUT
is used, I think, as that implies it is a different resource than the UA intended to make.It is different with
POST
, but my opinion is that the server should nevertheless not make changes that changes the interaction models. If that needs to happen, it is most certainly a mistake by the programmer, and in my experience, it is much better to learn of such errors early, than far down the development cycle when the mistake has had noticable end-user consequences. Therefore, I think there are cases where exactness should be enforced, I'll return to the exact cases.HTTP also uses the term "consistent":
In this case, I think the suggested transformation does not imply that it is legitimate to create a different type of resource (e.g. change the interaction model), the transformation applies when you can merely change some aspects of the representation. For these interaction-model-changing situations, I think this makes it very clear that an error is appropriate.
Again, I think it is better to have an error early than learn of a mistake later. Thus, my opinion is that we should specify consistency and exactness requirements, and that we should specify that errors should be thrown when these requirements aren't met.
I'll try to tabulate various combinations of situations that could cause consistency and/or exactness failures when creating resources (note that the use of
text/turtle
is just an example of a resource represented by an RDF serialization, andtext/plain
a generic Non-RDF resource):foo.ttl
PUT
BasicContainer
text/turtle
foo.ttl
PUT
NonRDFSource
text/turtle
foo.ttl
PUT
RDFSource
text/turtle
foo.ttl
PUT
text/turtle
foo/
PUT
BasicContainer
text/turtle
foo/
PUT
NonRDFSource
text/turtle
foo/
PUT
RDFSource
text/turtle
foo/
PUT
text/turtle
foo.txt
PUT
BasicContainer
text/plain
foo.txt
PUT
NonRDFSource
text/plain
foo.txt
PUT
RDFSource
text/plain
foo.txt
PUT
text/plain
foo/
PUT
BasicContainer
text/plain
foo/
PUT
NonRDFSource
text/plain
foo/
PUT
RDFSource
text/plain
foo/
PUT
text/plain
foo/
POST
BasicContainer
text/turtle
foo/
containerfoo/
POST
NonRDFSource
text/turtle
foo/
POST
RDFSource
text/turtle
Slug
foo/
POST
text/turtle
Slug
foo/
POST
BasicContainer
text/plain
foo/
POST
NonRDFSource
text/plain
foo/
containerfoo/
POST
RDFSource
text/plain
foo/
POST
text/plain
Slug
I believe
PATCH
for resource creation is identical toPUT
. We really need to definePOST
on non-containers as an append operation, but I left that out because that's an orthogonal issue.Finally, we need to decide what a failure should result in. Earlier, I advocated a simple
400
, but then, the use of the term inconsistency in HTTP, as quoted above, has made me think that409
is appropriate. In all these cases, there is a conflict either between different parts of the request, or between the request and the slash semantics of Solid (which could count as configuration information in HTTP's terms). There are also cases where415
is appropriate, so I think we should allow that when appropriate, but generally, the fails in the table should result in a409
.There are considerations around this in several older issues that I think was closed a bit prematurely. #121 deals with conflicting interaction models. There's also some discussion in #105 and others.