solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
486 stars 46 forks source link

Clarify the heuristics to determine the interaction model if none is specified #128

Closed RubenVerborgh closed 4 years ago

kjetilk commented 4 years ago

Since we don't have the actual clarification here, I'll return this to TODO.

kjetilk commented 4 years ago

I think there are two parts to this, one is the specific heuristics, which is simple. The other thing is consistency, which is probably harder. The latter was more the intention of #121 , but it really is connected.

My proposal for the heuristics:

  1. Iff the request-URI path component ends with the character / is an LDP-C. Since only LDP-BC is supported by Solid, it is permissible to interprete it as that. Note, however, that in Solid, LDP-C doesn't necessarily imply LDP-RS, since it may be represented by an RDF-bearing document.
  2. If the request-URI path component does not end with /, then the resource is an LDP-RS iff it has a media type that implies that it state is fully represented in RDF, corresponding to an RDF graph.
  3. Any other resources are assumed to be LDP-NR.

I'm thinking it is good to define it in terms of media type in the second point, so that the interaction model can always be inferred from headers (since #70), the alternative is to inspect the body to ensure that. We could probably enumerate the media types that are LDP-RS at this point.

I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS, (#69), but I'm willing to go with that if it makes stuff simpler.

TallTed commented 4 years ago

[@kjetilk] I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS

There is no between. "Fully represented in RDF" (LDP-RS) and "not fully represented in RDF" (LDP-NR) add up to 100% of all possibilities (LDPR). HTML+RDFa docs are not fully represented in RDF, so they are LDP-NR.

Otherwise, I think I agree with your last.

acoburn commented 4 years ago

Would it make sense to require the response headers to include an indication of how the server understood the request? That is, if the server understands the new resource to be an LDP-BC, then the server MUST respond with a corresponding Link header, indicating an LDP-BC was created (the same would apply to any other LDP interaction model).

kjetilk commented 4 years ago

[@kjetilk] I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS

There is no between. "Fully represented in RDF" (LDP-RS) and "not fully represented in RDF" (LDP-NR) add up to 100% of all possibilities (LDPR).

But that's not what it says, it says: "An LDPR whose state is fully represented in RDF," vs "An LDPR whose state is not represented in RDF.", which leaves open a class of things that are partially represented in RDF.

HTML+RDFa docs are not fully represented in RDF, so they are LDP-NR.

I think I can go with that though, since I can see that it may be just inaccuracy, which happens.

csarven commented 4 years ago
  1. / is a RDF Source (RDF 1.1). Should apply containment behaviour and resource lifecycle like LDP-BC.
  2. re "media type that implies", you can't go from there exists a possibility of a representation with non-RDF emitting content to all representations of a media type. That doesn't rule out text/html, if that was what you were aiming at. It is permitted to have a representation in HTML+RDFa without any non-RDF emitting content corresponding to an RDF graph. If you want to relate to LDP, it qualifies for LDP-RS.
csarven commented 4 years ago

Categorically pinning text/html, application/xhtml+xml, image/svg+xml etc to LDP-NR conflicts with the possibility of having an HTML(+RDFa) representation available from /. In #69 we have already acknowledged and accepted the use cases which requires HTML(+RDFa).

kjetilk commented 4 years ago

Categorically pinning text/html, application/xhtml+xml, image/svg+xml etc to LDP-NR conflicts with the possibility of having an HTML(+RDFa) representation available from /. In #69 we have already acknowledged and accepted the use cases which requires HTML(+RDFa).

That's a practical question, I think. As a matter of definition, it was what I was referring to when I said that LDP-C does not imply LDP-RS. That may, or may not take care of it, depending on the practical implications of the assumptions made by pure LDP implementations.

We could also say that HTML+RDFa etc are just LDPR...

kjetilk commented 4 years ago

Or... We could say that "these media types must be inspected to determine if the document is fully represented in RDF and thus LDP-RS", and then leave the rest to be LDP-NR. But it won't solve the conflict of LDP-C not necessarily being LDP-RS.

csarven commented 4 years ago

If / is supposed to be handled as a LDP-BC, we shouldn't venture from LDP-RS. I don't particularly like the idea of Solid having a different take on LDP eg. / is LDP-BC but not necessarily LDP-RS.. or even suggesting that a resource has a (default?) interaction model but will differ depending on the representation. That would be internally inconsistent and not a simple design. Just because LDP works with vague notions attempting to cover spectrum of things with eg "fully", "partial", or "not have useful", doesn't mean that Solid has to go on that path. If it is really about that, I feel much more comfortable to default to RDF 1.1's notion of what RDF Source entails and not get sucked into pedantic over-engineering.

/ is intended to be a LDP-BC and LDP-RS and so must be handled as a RDF graph. Everything else is secondary or a non-issue... out of scope.

To answer @acoburn 's question:

Would it make sense to require the response headers to include an indication of how the server understood the request?

Yes, I agree that makes sense (as I've already proposed elsewhere). We note that LDP doesn't specify how requests without an interaction model should be handled. So, specifying it for Solid is not about compatibility with LDP but just us striving for normalisation and uniform handling.

kjetilk commented 4 years ago

/ is intended to be a LDP-BC and LDP-RS and so must be handled as a RDF graph. Everything else is secondary or a non-issue... out of scope.

We'd have to take that up with @timbl , basically, it is his design that / can return a representation where the state is not fully represented in RDF that is incompatible with LDP's notion that LDP-C isa LDP-RS.

My opinion is, and has always been, that this notion (LDP-C isa LDP-RS) is a Really Bad Idea[tm], so, I don't feel the tension personally, but indeed, it is important for Solid to have a clear relationship to LDP.

kjetilk commented 4 years ago

We need to define heuristics with the use of a Slug, BTW. It may be better to define that in #108 , though.

csarven commented 4 years ago

https://github.com/solid/specification/issues/96

kjetilk commented 4 years ago

96

Good catch. Forgot that one, since it wasn't on the project board... I'm starting to think that we may need to elevate that from the implementation detail status. I envision that the presence of a / will influence whether it will be classified as a Container or not, which would not be just an implementation detail.

csarven commented 4 years ago

Yes, I think our perspective on that evolved (through other issues).

I suggest to apply the same heuristics from the resolution of https://github.com/solid/specification/issues/107 on https://github.com/solid/specification/issues/96 .

kjetilk commented 4 years ago

Actually, rereading, I found that the description #96 to be quite different from what needs to be resolved here. Also, I think it should be orthogonoal to #107 , since I think the use of a Slug is about the client making a suggestion, the server will do it damnedest to find a reasonable solution. If the client wants to be exact, they should use PUT, which I defined in terms of exactness in #40.

So, I'll guess I'll go with my proposal right here. :-) It seems there are six possibilities, combinations between whether the Slug ends with a / or not and whether the request has an LDP Link rel="type" header.

Here's my suggestion:

Given Slug Given Link Result
foo/ no Link header Create a container with foo/ appended to the request-URI.
foo/ Link header setting LDP-C Create a container with foo/ appended to the request-URI.
foo/ Link header setting some other LDP interaction model Create a resource with foo appended to the request-URI, but check for consistency (as discussed in #40)
foo no Link header Create a resource with foo appended to the request-URI, check for consistency
foo Link header setting LDP-C Create a container with foo/ appended to the request-URI.
foo Link header setting some other LDP interaction model Create a resource with foo appended to the request-URI, check for consistency

[Removed updated suggestion]

csarven commented 4 years ago

If Slug is provided, the slash semantics is used for the intended URI.

If the Link header accompanies Slug, Link has a higher specificity than Slug for the algorithm generating the URI.

kjetilk commented 4 years ago

Right! You can put it that way, but it has to say something more concrete about what to do with the slash if it conflicts with the expectation set by the Link, like in the third row in the table.

Can we formulate a consensus around this?

I think we first need to construct the Resource URL. With some methods, (i.e. PUT, PATCH), the URL is the Request-URI, while with POST), the Resource URL is constructed by the server, using the heuristics based on Slug and Link as discussed above, or assigned a string by the server if Slug is not present.

Then, the Resource URL is checked for consistency, e.g. a Resource URL not ending with / cannot be an LDP-C.

Then,

  1. Iff the Resource URL path component ends with the character / the resource can be assumed to behave like an LDP-BC. However, in Solid, LDP-C doesn't necessarily imply LDP-RS.
  2. If the Resource URL path component does not end with /, then the resource is an LDP-RS iff it has a media type listed in the a list TBD.
  3. Other resources may be treated as LDP-NR.
csarven commented 4 years ago

I agree with the steps involving the construction of the effective request URI and setting the interaction model.

kjetilk commented 4 years ago

Good! Anyone want to express discomfort? Can we say we have rough consensus?

TallTed commented 4 years ago

(Probably worth renaming this issue from "Clarify the heuristics do determine" to "Clarify the heuristics to determine")

kjetilk commented 4 years ago

I have updated the table with a suggestion for when no Slug is set, since this also needs to be explicit.

kjetilk commented 4 years ago

Updated suggestion, since we need to be explicit when there's no Slug.

Given Slug Given Link Result
foo/ no Link header Create a container with foo/ appended to the request-URI.
foo/ Link header setting LDP-C Create a container with foo/ appended to the request-URI.
foo/ Link header setting some other LDP interaction model Create a resource with foo appended to the request-URI, but check for consistency (as discussed in #40)
foo no Link header Create a resource with foo appended to the request-URI, check for consistency
foo Link header setting LDP-C Create a container with foo/ appended to the request-URI.
foo Link header setting some other LDP interaction model Create a resource with foo appended to the request-URI, check for consistency
none no Link header Create a resource with server determined identifier appended to the request-URI, check for consistency
none Link header setting LDP-C Create a container with server determined identifier appended to the request-URI.
none Link header setting some other LDP interaction model Create a resource with server determined identifier appended to the request-URI, check for consistency
csarven commented 4 years ago

To be clear, the results also include updating existing resources, and not only create, right?

kjetilk commented 4 years ago

In LDP, you can specialize interaction models, but does it make sense in Solid to change them at all? To me, it seems like the interaction model MUST be fixed once it is created?

kjetilk commented 4 years ago

If yes, then we should formulate something more specific in #121 .

csarven commented 4 years ago

I've excluded the scenario of changing interaction models. It was really whether the effective request URIs above (Slug at some point factored in or not) is only for create. I don't see why it doesn't include append. Reusing the same Link header (as the one previously assigned) is not a violation after all.

While the table is useful and goes beyond what this issue was set out to do, it can be misleading if taken at face value as it seems to be only about create. I've noted several scenarios and variability related to Slug in https://github.com/solid/specification/issues/108#issuecomment-574154189 where the update/append became more apparent.

kjetilk commented 4 years ago

Ah, right, but I feel strongly that once created, we should only manipulate a resource by its own URI. Slug: . is interesting because it preserves that. Going beyond that and start involving Slug in further update operations, I think would be Pure Evil. :-)

kjetilk commented 4 years ago

Just to note, NSS says:

The name of new file POSTed may not contain : | or /

in the case where the Slug ends with a /.

I don't think Slug should be restricted like that, it only serves to limit the client for no gain, so I think we should stay with the proposed consensus.

csarven commented 4 years ago

I've encountered that error on NSS as well when I tested Slug. Noted it as a bug because as far as I can tell from https://tools.ietf.org/html/rfc5023#section-9.7.1 , / is a permitted reserved character and doesn't need to be percent-encoded. If you can confirm this as well, that'd be great. / can appear anywhere and any number of times in slugtext.

RubenVerborgh commented 4 years ago

/ can appear anywhere and any number of times in slugtext

But then slugs can be used to create folders recursively, given that slashes do have a meaning in Solid? And this would create complex locking problems?

Not to mention the attack vector for security problems.

I'm against.

csarven commented 4 years ago

Recursive container creation is an atomic operation and not unique to Slug use. Applying the slash semantics to the Slug value keeps things consistent ( https://github.com/solid/specification/issues/96#issuecomment-570086753 ).

Can you elaborate on how the attack vector for security problems that you are thinking of is unique to Slug use?

RubenVerborgh commented 4 years ago

Recursive container creation is an atomic operation

Understood; could you add a reference for that (might provide context for the next question)?

Applying the slash semantics to the Slug value keeps things consistent ( #96 (comment) ).

But then I think we need a MUST there (will comment).

Can you elaborate on how the attack vector for security problems that you are thinking of is unique to Slug use?

Depending on how recursive resource creation is defined, the problems would be the same. I.e., we should in both cases warn against things such as:

However, the real danger is that implementers overlook the fact that it is recursive container creation, as they might expect the slug to be just a filename.

A such, I would ask the question differently: what is the use case for recursive container creation via slug?

kjetilk commented 4 years ago

I'm thinking that we do need to restrict the acceptable characters or even what the server may use to construct a resource identifier (with relevance to #142), but that the server should otherwise have the liberty to construct identifiers based on the Slug, rejecting something should be a rare occurance.

That said, I say we defer recursive container creation with Slug until after FPWD. When using PUT, the URI resolution is already well-defined. Later, we can port that definition into our understanding of Slug, but for now, lets not do that.

csarven commented 4 years ago

Understood; could you add a reference for that

https://github.com/solid/specification/issues/68#issuecomment-561690124 , https://github.com/solid/specification/issues/118#issuecomment-569648485 , https://github.com/solid/specification/issues/126#issuecomment-569920473 , https://github.com/solid/specification/issues/137#issue-548570804 .. there is more out there I'm sure. Resolving security follow up https://github.com/solid/specification/issues/129 can help further. Requirements like https://github.com/solid/specification/issues/107#issuecomment-567482817 can help towards halting if a particular segment (container) can't be created.

I think we need a MUST there

IIRC, the rationale for SHOULD was because of what Slug entails and that it would still be within the confines of client suggestion as well as server choosing to ignore or use as it prefers. I also understand the rationale for MUST and that would be equally valid but it may be overstepping Slug.

we should in both cases warn against things such as

These would be fine as non-normative information in general but I don't see it being specific to Slug.

However, the real danger is that implementers overlook the fact that it is recursive container creation, as they might expect the slug to be just a filename.

Servers that use the Slug (with a particular algorithm eg. slash semantics) will have to generate an effective request URI and perform the request based on that as usual. Again I'm not sure if there is anything unique here for Slug but I'd be content to clarify these further as non-normative information because we are really not defining anything new other than suggesting to apply the slash semantics algorithm for the sake of consistent path segment handling.

what is the use case for recursive container creation via slug?

The same use cases that would require PUT or PATCH to create recursive containers. Slug just happens to be a hint or a suggestion to the server on the naming.

RubenVerborgh commented 4 years ago

I'm thinking that we do need to restrict the acceptable characters or even what the server may use to construct a resource identifier (with relevance to #142),

And you might even want to limit length, as per https://github.com/whatwg/fetch/issues/862#issuecomment-579118131, to avoid that people ask for 10MB slugs. (Also note for PUT.)

I also understand the rationale for MUST and that would be equally valid but it may be overstepping Slug.

Security seems key here. If Slug has it as a SHOULD, but you create it, and the slash interpretation is a SHOULD or MUST on GET, we might get in trouble.

non-normative information in general

The danger is huge. Why allow .acl slugs and ../ sequences? Open attack vector IMHO.

The same use cases that would require PUT or PATCH to create recursive containers. Slug just happens to be a hint or a suggestion to the server on the naming.

Then use those, I'd say. This is the kind of security risk where it's hard to imagine all implementations to get it right. I'd strongly ask the security experts to weigh in.

csarven commented 4 years ago

By mere definition of Slug, server can ignore or change it to what it deems to be useful or safe. ".acl" has no special meaning or expectation as far as the spec is concerned. If an implementation considers that to be a pattern that it best avoids, it will strip it off or use something else. As for ".." , these fall under how the effective URI is handled. No different than request-target being "..". Server still has to construct the intended target. In some cases it will be justifiably useful (relative paths) and in other cases it may be a malicious or nonsensical attempt that the server can ignore. Server choosing what to do also goes for client asking to create many nested containers and not getting it.

You asked for use cases that would entail recursive container creation which happens to route through Slug (because of slash semantics for consistency).

RubenVerborgh commented 4 years ago

That, or we can avoid the whole attack vector by not allowing these things in slug, such that there is no expectation of them to work at all. The risk is that Slug could follow different code paths from regular request URI handling, and be way less tested because of it being more rare and having more edge cases because of it being a relative URI.

Anyway, not my call; just bringing in arguments of why a security expert should really look at this.

csarven commented 4 years ago

All features in all specs will be reviewed by security experts.

There are indeed many attack vectors but the ones that you've raised can either be ignored or has no more specialty than relative request-targets on their way to the effective request URI. Nevertheless, I do agree that potential risks due to different code paths would be useful to note in the spec. Thanks for raising that.

Edit: "attack vectors" not "vendors" =)

kjetilk commented 4 years ago

We could surely define in terms of request URI handling and detail how to construct an effective request URI that would need to validated. We could put some common attacks in the test suite, but lets just consider it later :-) For now, lets just redefine slugtext to exclude recursive container creation. We could be even more restrictive and for example say that we allow only the regexp \w+/?$. We could also remove any non-matching character.

Edit: Oh, I didn't read what it said about Unicode, that last part would exclude that. So, what I wrote was correct, just much more restrictive than I intended :-)

kjetilk commented 4 years ago

Another note on NSS, as a response to the case of foo with no Link header (edit: remove attempt to make simple table)

NSS5 responds with

Location: /foo.ttl

rather than just foo. Is this what we expect? For now, I'll assume yes.

RubenVerborgh commented 4 years ago

Is this what we expect?

It does not seem correct.

csarven commented 4 years ago

That would be a valid response that would pass the tests. I think the primary concern was about classifying the request as a container or a non-container resource. I don't think NSS's response is good because it should probably stick to /foo and do /foo$.ttl on disk. It is not entirely clear what requesting JSON-LD on /foo.ttl is supposed to do.. not to mention the slight confusion. Any way, this bit is an implementation detail.

csarven commented 4 years ago

Noting here that the rough consensus in this issue: https://github.com/solid/specification/issues/128#issuecomment-573033297 was taken up in PR https://github.com/solid/specification/pull/160 , reviewed, and approved.

However, the issue was revisited in https://github.com/solid/specification/pull/160#issuecomment-635932724 highlighting the functional requirements, variability, and complexity to implement the heuristics.

https://github.com/solid/specification/pull/160#issuecomment-636822687 proposed an alternative approach that was deemed to simplify implementation and remove variability. It was reviewed and approved. https://github.com/solid/specification/commit/c1aac426f859b5c7d5f920822abb56ea56220e32 includes the criteria based on the proposal.