httpwg / http-core

Core HTTP Specifications
https://httpwg.org/http-core/
467 stars 43 forks source link

Clarify range validity/satisfiability #1011

Closed amichair closed 2 years ago

amichair commented 2 years ago

A couple of related issues: First, from section 14.1.1:

An int-range is invalid if the last-pos value is present and less than the first-pos.

and from 14.1.2:

If a valid bytes range-set includes at least one range-spec with a first-pos that is less than the current length of the representation, or at least one suffix-range with a non-zero suffix-length, then the bytes range-set is satisfiable. Otherwise, the bytes range-set is unsatisfiable.

So the first bug is that by this definition the invalid range-set "6-2" for a representation of length 10 would be considered satisfiable since the first-pos is less than the current length, even though the range-spec is invalid. It should probably say "at least one valid range-spec".

In the latter quote, what does "valid bytes range-set" mean? A range-set's validity or invalidity is not explicitly defined, unless it refers here only to the syntactic correctness (no illegal chars etc.) regardless of whether any range-specs in it are invalid.

More generally, it is unclear what the relationship between an invalid range-spec and invalid range-set is: is the set invalid if at least one spec is invalid, or should invalid specs just be ignored as long as there are other valid ones in the set?

This is similarly confusing in section 14.2:

If all of the preconditions are true, the server supports the Range header field for the target resource, and the specified range(s) are valid and satisfiable (as defined in Section 14.1.2), the server SHOULD send a 206 (Partial Content) response with a content containing one or more partial representations that correspond to the satisfiable ranges requested.

...

If all of the preconditions are true, the server supports the Range header field for the target resource, and the specified range(s) are invalid or unsatisfiable, the server SHOULD send a 416 (Range Not Satisfiable) response.

it mentions range(s) being valid/invalid or satisfiable/unsatisfiable, but valid/invalid is only defined for a range-spec, and satisfiable/unsatisfiable is only defined for a range-set (and not individual ranges). btw there are later references to a single range being satisfiable/unsatisfiable, although this is undefined.

Here too there is ambiguousness - it's unclear what happens if e.g. there is one invalid range-spec in the set (such as "1-5,9-7") - does that mean the "the specified range(s) are valid and satisfiable" (not all of them are) or "the specified range(s) are invalid or unsatisfiable" (not all of them are).

I think this should be clarified - either explicitly define valid/invalid/satisfiable/unsatisfiable separately for range-spec and for range-set, and then use the appropriate terminology in section 14.2 (range-spec or range-set rather than "range(s)"), or just explicitly define how an invalid range-spec affects the range-set (ignore it or make the whole set invalid), and/or qualify in both paragraphs in 14.2 whether it is 'all ranges' or 'at least one range' or whatever is deemed correct.

royfielding commented 2 years ago

The adjective valid isn't being defined here -- it is just being used as defined in English (i.e., properly formed or acceptable). It follows that an invalid range-spec is not a range spec, as defined by the prose, and thus wouldn't be acceptable as a range-set (because that would make the range-set invalid). This doesn't need to be specified further.

The reason range-spec validity is further specified in prose is because we can't define (first-pos <= last-pos) in ABNF.

The response in this case SHOULD be 416; regardless, the server will decide "what happens" and send 416, 200, some other error, or a 206 that contains only the valid ranges. Any of those responses are self-descriptive, and thus interoperable.

In any case, this spec has already been approved and is awaiting final publication.

amichair commented 2 years ago

Ok, so I understand from your answer that if at least one of the range-specs within a range-set is invalid, then the whole range-set is invalid. That's what I think should be clarified in the spec (even if in a future version of it, or errata, etc.), since the current definitions and wording leave a lot of room for interpretation.

At the very least, the second paragraph in 14.2 should say "if at least one of the ranges is invalid, or none of them are satisfiable" or "if the set of ranges is invalid or unsatisfiable" or something like that. The current wording "the specified range(s) are invalid or unsatisfiable" can be understood to mean all of them, which leaves the case of mixed valid/invalid specs within the same set undefined in terms of what status to return.

amichair commented 2 years ago

A related follow-up: if a resource length is 1000, and the range-set is "0-99,9998-9999". On the one hand, the second range-spec is valid but unsatisfiable, but the first one is satisfiable, so the whole set is valid and satisfiable and a 206 containing the range 0-99 should be returned. On the other hand, the spec says if the last-pos is beyond the length, it should be replaced with length-1, so this would be replaced with "0-99,9998-999". Now the second range-spec is invalid, which makes the whole range-set invalid, which should result in a 416 (according to your previous answer). So which one is it, 206 or 416? Same goes for "0-99,9998-". Are unsatisfiable range-specs always treated as invalid range-specs because of this replacement required by the spec (thus making the whole range-set invalid)?

royfielding commented 2 years ago

In my experience, which code the server responds with is implementation and order-dependent. Any of them are interoperable, even when all range-specs are valid and satisfiable. While it is theoretically possible for someone to be confused about this, the spec is only pointing out the obvious: there is no need to respond successfully to bad ranges.

I'll reopen this to see if the other editors want to make a small editorial change here before the RFC goes out.

amichair commented 2 years ago

Ok, thanks for the explanations and consideration :-)

fwiw, in my case the confusion is not theoretical ;-) I've written an HTTP server which aims to be RFC compliant, I have a unit test for the range functionality with lots of different examples of range-sets with edge cases and the like as one does in unit tests, and I'm quite confused as to what the expected result should be in some of them... it's a bit frustrating reading this section of the RFC - it starts off with well-defined ABNF and some clear explanations of how each type of range-spec should be parsed, but then abandons the terminology and moves to more loose terms and non-mutually-exclusive options when it comes to combining them, which status to return etc. I think a tad more strictness of terms and logic would do it good...

reschke commented 2 years ago

FWIW, lack of strictness may be because of implementations do vary, and we can't "break" them now just because we'd prefer things to be simpler.

amichair commented 2 years ago

True, but variations can also be well-defined, and are often addressed directly in the RFCs with examples, MAYs, notes on existing or early implementations, alternatives, etc. If that is the reason here that's fine, but should also be mentioned in the spec, rather than leaving implementers scratching their heads trying to guess what the authors may or may not have intended, or overlooked, or left ambiguous for historic reasons, or whether something just fell between the cracks because we're all human. In a nutshell, if an implementation can't determine whether it is compliant with the spec or violating it, I vote for clarifying the spec...

royfielding commented 2 years ago

Rigid interpretation of protocol design is why most distributed systems fail.

Range is an optional feature that the server can turn off at any time or simply choose to ignore. The server can even decide to reinterpret what the client is requesting based on a known bug in the client. Hence, requiring a particular response for every particular input is not the goal.

The goal is to respond in ways that are self-descriptive and interoperable, regardless of input.

amichair commented 2 years ago

In the interest of of avoiding a philosophical discussion and sticking to the specifics of this issue, I'd like to clarify that I was not advocating any specific changes to what the servers should or shouldn't do, but simply to editing the RFC to make it clear. Specifically:

  1. range-set is defined in ABNF, and then there is a full paragraph defining what a satisfiable range-set and unsatisfiable range-set are. Following that, the terms satisfiable range-set and unsatisfiable range-set are never referenced again. In fact there are exactly zero appearances of the term range-set at all anywhere again. Would you be open to considering that precisely defining terms and then never using them anywhere might be an issue worth fixing?

  2. "...and the specified range(s) are valid and satisfiable" - regardless of what the response is, what exactly is this sentence referring to? If "range(s)" can be plural, it's probably not referring to a range-set of which there is only one, but individual range-specs. But "satisfiable" is only ever defined for a range-set as a whole and not individual range-specs. Why does the spec go through the trouble of precisely defining the terms up front, but then using vague phrasing such as this instead of those well-defined terms? I think using the defined terms (or defining missing terms such as a satisfiable range-spec) would be an improvement.

  3. The two sentences from 14.2 quoted above: one says "If... range(s) are valid and satisfiable... SHOULD send a 206", the second "If... range(s) are invalid or unsatisfiable... SHOULD send a 416". As stated before, e.g. in the case of a range-set with one valid range-spec and one invalid range-spec, the non-precise predicates are confusing what the correct behavior is supposed to be. The two "if"s somewhat contradict each other in this case, or at the very least are open to interpretation (all? some? at least one?) and/or it is not clear if they should apply. If I understand the argument in the answers above, it is to allow for flexibility in how the server can respond - but it seems that flexibility is somewhat conflated with vagueness in these sentences. If flexibility is the design goal, the wording can still be very clear about it: it can use the proper terms, non-conflicting predicates, etc. to cover the recommended statuses, and add a third sentence stating that since the responses are self-descriptive, in certain (or all) cases servers can return either status as it sees fit. This would make the intention (flexibility) and rational (self-descriptive) clear - you get all the backwards compatibility and flexible design you want, by simply stating what it means, and not by having implementors trying to guess what it means, then giving up trying to figure it out and just doing whatever they want and hoping it'll be ok.

  4. btw, while I get the flexibility argument, note that e.g. in RFC7232 section 6 (Precedence of evaluating conditonal headers) the confusion in the earlier spec was replaced with clear, rigid and well-define rules, even though it broke compatibility with existing implementations (some statuses returned in certain circumstances were explicitly changed). As a server implementor moving from RFC2616 to RFC7232, although it required changing some code, I consider this change an improvement (or even fix) to the spec and not a hindrance to its success... just sayin', choosing clarity and stricter logic instead of confusing phrasing is not unprecedented even in the history of this very RFC :-) (but if you decide on leaving it all open, I'm fine with that - not necessarily advocating the specific solution, as long as the spec is clearly worded.)

Thanks for your consideration...