IESG ballot on draft-ietf-httpbis-p1-messaging-25

Benoit Claise

Comment (2013-12-17)

Thanks Tom Nadewu for your OPS-DIR review. I know how much you spent on this one!

I see the HEAD request, which I didn't know about:

Transfer-Encoding MAY be sent in a response to a HEAD request or in
a 304 (Not Modified) response (Section 4.1 of [Part4]) to a GET request

I was wondering: where is the list of valid HTTP operations defined?

As defined in section 3.1.1, any string that matches the grammar (method = token) is a valid method. Section 4 of p2 describes the methods defined by this specification. There are many other methods defined by other specifications.

I finally found it (my mistake was that I was searching for "operation" in the document while the correct term is "method"):

The request methods defined by this specification can be found in
Section 4 of [Part2], along with information regarding the HTTP method
registry and considerations for defining new methods.

This points to: http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-25

   4.3.  Method Definitions . . . . . . . . . . . . . . . . . . . 24
   4.3.1.  GET  . . . . . . . . . . . . . . . . . . . . . . . . . 24
   4.3.2.  HEAD . . . . . . . . . . . . . . . . . . . . . . . . . 25
   4.3.3.  POST . . . . . . . . . . . . . . . . . . . . . . . . . 25
   4.3.4.  PUT  . . . . . . . . . . . . . . . . . . . . . . . . . 26
   4.3.5.  DELETE . . . . . . . . . . . . . . . . . . . . . . . . 29
   4.3.6.  CONNECT  . . . . . . . . . . . . . . . . . . . . . . . 30
   4.3.7.  OPTIONS  . . . . . . . . . . . . . . . . . . . . . . . 31
   4.3.8.  TRACE  . . . . . . . . . . . . . . . . . . . . . . . . 32

Anyway, an extra sentence, such as the following one, would have helped me:

"Existing methods are GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE"

That would not be true. Those are only the methods defined by this specification, and they are adequately summarized in p2. There are many other methods defined by other specifications, which is also described in p2. Repeating all of this in p1 would be counter-productive -- the split is specifically intended to emphasize where parsing is (or is not) sensitive to the semantics.

Also change "HEAD request" to "HEAD request method. Similar remark for "GET request"

Those phrases are in common use as short-hand for "a request that contains the HEAD method" and "a request that contains the GET method", respectively. I have added some section xrefs in [2561] where the phrase is first used and to help clarify where methods are defined.

Jari Arkko

Comment (2013-12-19)

Discussion with Meral's Gen-ART review comment seems to continue on some sub-items. Maybe worthwhile completing before sending of to RFC-Editor.

The review has been discussed and minor editorial concerns addressed. Several larger suggestions have not been applied because they fundamentally disagree with the editorial direction adopted by the working group, which has been based on the opinions of many individuals who have implemented the protocol. We cannot expect everyone to agree on editorial choices.

Joel Jaeggli

Comment (2013-12-16)

to ops-dir review noted (consistent with respect to usage in the document) but otherwise inconsistent employment of the term coding vs encoding in http 1.0 vs 1.1 vs here. I guess it's to much to ask that the name of the header field and the term of art employed to describe it be consistent.

The term "coding" is consistently used to refer to the algorithms used for both encoding and decoding. The word encoding is used consistently where we are talking about a condition after one or more algorithms have been applied (i.e., the data is encoded after a sequence of codings have been applied). As such, the different terms are being used to disambiguate the two different meanings. The header fields are *-Encoding because they provide the metadata that describes how the data has been encoded.

Pete Resnick

Comment (2013-12-18)

Throughout the document (and the other documents in the series): I now understand that you intend a two stage parse for header fields and have that represented in the ABNF as a separate overall message syntax and a header field value syntax. That's fine, but I would ask that you make this clearer somewhere in section 3 of the p1 document. You talk about the parsing, but I think it is well worth describing that there are two levels of ABNF, and that the ABNF rule name corresponds to the header field name. It is fine to do it this way, but it's not the way that ABNF has been used in the past, so best to make it crystal clear.

Fixed, see #540

Specific comments:

This HTTP/1.1 specification obsoletes and moves to historic status RFC 2616, its predecessor RFC 2068, and RFC 2145 (on HTTP versioning).

Please, no, it doesn't (and shouldn't) move any of these documents to Historic (even if it were capitalized correctly ;-) ). It obsoletes them. Please strike "and moves to historic status". (I'm happy to give you the long explanation of why moving to Historic is not the right thing if you like.)

Fixed, see #544

Also, an editorial nit: I find the "we" affectation distracting. Sounds like an academic paper. 11 occurrences in this document. "This document" or "this specification" (or simply switching to the passive voice) are much more IETF-like.

Fixed, see [2563].

2.3 (Editorial nit) "...upstream to downstream. Likewise, we use...". It's not "Likewise" here. "Upstream" and "downstream" are only about the direction of the message and don't have anything to do with who sent/received it. "Inbound" and "outbound" refer to direction based on who it's coming from (UA or OS). Strike "Likewise".

Fixed, see [2563].

2.5, para 8 (and ff): I'm not a fan of the "MUST..., unless..." construct. People get into stupid conformance arguments over such things. I prefer "MUST either..., or ..." or "SHOULD..., the primary exception being...".

Won't fix. Neither "MUST either" nor "SHOULD except" mean the same thing, since we are not talking about a single requirement where a specific alternative exists. I would rather have the conformance argument than change the wording here.

2.7.1, para 6: Why "MAY"? What else could it do? Is this a protocol option of some sort?

It could choose to use an internal cache, a trusted proxy, or disregard the context.

para 7: The concept of "establishing authority" is not well explained here. What's the import of it?

Fixed, see [2609].

para 8: Why "ought to"? That seems like a fine candidate for a "SHOULD": You're giving implementation advice to avoid damage.

Fixed, see [2582].

3.2.4:

A proxy MUST remove any such whitespace from a response message
before forwarding the message downstream.

Really? Wouldn't that cause the aforementioned "security vulnerability"?

No, because it removes the variability; it forces downstream recipients to interpret it as a repeated header field instead of two different fields. This is considered by the WG to be better than forcing the proxy to discard-and-error (won't be implemented because of common bugs in IIS-based sites) or forward an invalid message (would be interpreted as a bug in the proxy).

A field value is preceded by optional whitespace (OWS)...

"...and/or followed...", right?

Fixed, see [2583], [2584]. and [2585].

3.2.6: This is your only use of the term "escape". A bit imprecise. I suggest reusing the quoted-pair text for quoted-cpair.

Fixed, see [2519].

3.3.1: "encoding parameters MAY be provided by other header fields". I think MAY is wrong there. "Can"?

Fixed, see [2586].

3.3.2:

A sender MUST NOT send a Content-Length header field in any message
that contains a Transfer-Encoding header field.

Why not? Can there not ever be a Transfer-Encoding that has no implicit length? I read 3.3.3 sub 3 and I still don't get it.

"If any transfer coding other than chunked is applied to a request payload body, the sender MUST apply chunked as the final transfer coding to ensure that the message is properly framed." -- so if there is a Transfer-Encoding, it will ways be wrapped by "chunked"

4.1: I presume chunk-size can't be "0" even though the ABNF allows it?

Indeed. You parse the chunk-length, and if is 0 you know you have reached the last chunk

4.1.1: quoted-string doesn't allow folding, does it? Why do you need a new quoted-str-nf?

Fixed, see #528

5.5, first paragraph: Why do you have "MUST reconstruct" instead of "reconstructs", or simply reversing the sense of the whole paragraph and say, "An 'effective request URI' is a reconstruction of the user agent's original target URI"? I haven't found anything in the documents that says that effective request URIs are going to be passed as protocol parameters, but rather they are for local processing and comparison. Given that, the "MUST reconstruct" seems inappropriate.

Fixed, see [2587].

5.7.1: s/The received-by field/The received-by token OR The received-by portion of the Via header field

Fixed, see [2551].

5.7.2:

A non-transforming proxy MUST NOT modify the message payload
(Section 3.3 of [Part2]).  A transforming proxy MUST NOT modify the
payload of a message that contains the no-transform cache-control
directive.

I get the second sentence. But isn't the first a definition of a non-transforming proxy? Is so, I think you should change "MUST NOT" to "will not" or "does not".

Fixed, see [2563] and related discussion at http://lists.w3.org/Archives/Public/ietf-http-wg/2014JanMar/0181.html

6.3:

A server MAY assume that an HTTP/1.1 client intends to maintain a
persistent connection until a close connection option is received in
a request.
[...]
Clients and servers SHOULD NOT assume that a persistent connection is
maintained for HTTP versions less than 1.1 unless it is explicitly
signaled.

I'm not sure how to implement the option/requirement of "assume". :-) What is it that you want/expect/permit the implementation to do/not do?

Fixed, both were redundant and have been deleted, see [2591].

6.4: A "SHOULD" should not be used to "encourage" something. This seems like an utterly empty piece of normative text. "Be nice" without other guidance doesn't seem to lead to any useful interoperability.

Demoted to an ought, see [2592].

Thus, a sender MUST expand the list construct as follows:
[...]
a recipient MUST expand the list construct as follows:

The two MUSTs here strike me as goofy. Implementations of senders and recipients do not "expand" ABNF rules; they produce and parse text. Saying things like the following would make sense to me:

In any production that uses the list construct, a sender MUST NOT
produce empty list elements. In other words, senders MUST produce
lists that satisfy the following syntax: [...]

In other words, a recipient MUST accept lists that satisfy the
following syntax: [...]

Fixed, see [2593].

Richard Barnes

Comment (2013-12-18)

In general, this is a very nice introduction to the HTTP architecture. Thanks!

COMMENT 1:

The above categories ... are indistinguishable from a man-in-the-middle
attack.

It seems worth noting that "captive portal" is not equivalent to the other two terms; it's a special case.

I don't think that matters to the protocol.

I would also expand the last sentence to explain a little more why the two are equivalent, and to clarify that the distinction is technical (since one could make moral distinctions between a MitM and a porn-filtering proxy):

OLD: "They are indistinguishable from a man-in-the-middle attack."

NEW: "Because these entities intercept and modify packets without the consent of either endpoint, these entities are indistinguishable at a protocol level from a man-in-the-middle attack."

That would substantially duplicate the second sentence in the same paragraph. What I have done instead is move that sentence up, before the examples, and split the paragraph in two. See [2594].

COMMENT 2:

A sender MUST NOT generate protocol elements that convey a meaning that is known by that sender to be false.

This seems optimistic.

No more optimistic than the syntax requirements. This is what interop means for an application-level protocol.

COMMENT 3:

In Section 3.2.2, are the scare quotes around "good practice" necessary?

Fixed, see [2595].

COMMENT 4:

In Section 3.2.3, "to white-out invalid or unwanted protocol elements" -- what does it mean to "white out" protocol elements? To replace them with whitespace? Why not just remove them?

replacing instead of removing avoids moving bytes around

COMMENT 5 (almost a DISCUSS):

In Section 4.1.2: Suppose you have an intermediary that decodes the chunked encoding of an inbound message and generates a new message with known length (Content-Length present). It seems like you need to specify what happens to trailer fields in this case. The answer seems to be that they're just appended to the header, but AFAICT, that's not specified in the text.

Fixed, see #551

Sean Turner

Discuss (2013-12-19)

Edited to remove process nits...

OMG this was well written! Kudos to all involved!

0) In reference to OWS in the ABNF, isn't the correct ABNF syntax to include optional fields in [] - See s3.8 of RFC 5234? Sure the text says it's optional but aren't you mixing formal syntax and informal text. I guess this is sometimes done in ABNF for omitting fields but if you've got a mechanism to indicate a field is optional I don't understand why you're not using it.

Both [option] and *option indicate optional syntax -- see #537

This might be a parsing error on my part:

1) s4.1.2: WRT to the server generating an empty trailer, I don't follow bullet 2. Isn't it trying to say a server generates an empty trailer unless the trailer field consists of metadata the server requires be present (i.e., the server doesn't want the metadata dropped):

OLD:

A server MUST generate an empty trailer with the chunked transfer
coding unless at least one of the following is true:

1. ....

2.  the trailer fields consist entirely of optional metadata and the
   recipient could use the message (in a manner acceptable to the
   generating server) without receiving that metadata.  In other
   words, the generating server is willing to accept the possibility
   that the trailer fields might be silently discarded along the
   path to the client.

NEW:

A server MUST generate an empty trailer with the chunked transfer
coding unless at least one of the following is true:

...

2.  the trailer fields contains metadata that the
   recipient needs to use the message (in a manner acceptable to the
   generating server).  In other
   words, the generating server isn't willing to accept the possibility
   that the trailer fields might be silently discarded along the
   path to the client.

What it tries to say is that the trailer MUST ONLY be used if the request indicated that trailers are understood or if all you put into the trailers is optional metadata (so you don't care if it's lost). It would if RFC2119 defined a MUST ONLY. Fixed as a larger rewrite, see #551

Comment (2013-12-19)

Caveat: I know this is a bis draft but since you hacked it up for clarity, I figured I'd give you both barrels when reading it (i.e., it goes to "11" on the nits scale). With that said, I would not like to see any of my comments hold progression of this draft up for a microsecond. Feel free to consider these if you're making other changes before progressing to Approved or during AUTH48.

*) Support Stephen's discuss.

0) abstract: The WWW global initiative is a reference to this: http://www.w3.org/Summary.html , which hasn't been updated since ~1991/2? Maybe we can drop the reference to that and just say:

HTTP has been very widely used since 1990.

Or:

HTTP is the foundation of [this thing you might of heard of called] the World Wide Web architecture.

Fixed, see [2557].

2) Abstract & s1: to match s2.1:

 r/an application-level request/response protocol
 /an application-level stateless request/response protocol

Fixed, see #538

3) (no action required) Thanks for the collected ABNF in Appendix B.

4) s2.1: What's the difference between a native application and a mobile app? Isn't a mobile app on a mobile phone a native application for that mobile phone?

Yes, changed "native application" to "custom application" in [2563].

5) s2.3: Maybe worth explaining what a public network access points might by adding: (e.g., accessing the Internet from a hotel).

TMI, I think. Yes, that is a common case, but not all hotels are the same.

6) s2.3: Mentions proxies are done through a local configuration rules: Should we note these might be set by an administrator and that users should be aware of these settings?

This is in the security considerations regarding intermediaries.

7) s2.3: Would it be better to say proprietary:

 r/Some non-standard HTTP extensions (e.g., [RFC4559])
 /Some proprietary HTTP extensions (e.g., [RFC4559])

No, we explicitly meant non-standard, as in not standards track and never will be because it doesn't work outside closed networks.

8) s2.6: Shouldn't we be future proofing this protocol to address the two digit version bug :) Never mind I got to A.2 and find out folks can't handle two digit versions consistently.

not minded

9) s2.7: Does there need to be a statement that all entities MUST support URIs as defined in RFC 3986? There's some language earlier about relying upon URIs, etc., but there isn't a specific MUST support.

It is required by definition -- no need for a MUST because it is encompassed by conformance section.

A) s2.7: Maybe add the following before the list:

The following provide references for the URI syntax used in this document:

I think that is clear from the context. They also provide ABNF terminals and are used in the prose.

B) s2.7.1, 2nd para: r/optional/OPTIONAL in reference to the query. It would make the text match the ABNF syntax ;)

Those are referring to the names of the whitespace; this will be clearer once I move the section defining OWS above this one.

C) s2.7.1, 2nd para last sentence: Mentions the path and query component but omits the fragment component - but the fragment is in the http-URI exhibit above. Maybe worth including in the sentence for completeness.

Fixed, see [2596].

D) s2.7.1: Please expand WWW on first use ;)

[Actually, I'd ask the RFC editor to include it in there list of abbreviations: http://www.rfc-editor.org/rfc-style-guide/abbrev.expansion.txt so that no one will ever see this comment ever again.]

Fixed elsewhere, see [2555].

E) s2.7.1: Is Internet Name = registered or domain names? and is Internet address = IP number? Those terms are used later so maybe: its registered name or IP address

The
"http" URI scheme makes use of the delegated nature of Internet domain names
and IP addresses to establish a naming authority (whatever entity has
the ability to place an HTTP server at that Internet domain name or IP
address) and allows that authority to determine which domain names are valid
and how they might be used.

Then again this might all of been carefully crafted to avoid some long running debate that I am unaware of in which case this should be ignored.

Not crafted carefully enough. Hopefully fixed by [2597]. A long running debate would be an understatement.

F) 2.7.2: Personally, I'd drop the [RFC0793] reference for the TCP port, it's already in the http schema and you reference that scheme from this scheme.

Fixed, see [2598].

10) s3, 1st para: r/optional/OPTIONAL - would make the text match the ABNF.

We only use the capitalized 2119 terms for targeted requirements.

11) s3.1.2: If clients are going to be ignoring the reason-phrase, should p1 or p2 say something about not emitting it? I mean what with all the need for speed from web search engines/browsers shouldn't we be trying to not send stuff that's going to promptly be ignored?

see http://lists.w3.org/Archives/Public/ietf-http-wg/2014JanMar/0016.html

12) s3.2: r/optional/OPTIONAL x2 - would make the text match the ABNF

We only use the capitalized 2119 terms for targeted requirements.

13) s3.2.3: should optional and required be replaced by their 2119 keywords?

Those are descriptive names for the ABNF rules.

14) s3.3.1: (see #2) What does a client do if the server chunks more than once, if a server sends a Transfer-Encoding header when it shouldn't have?

See last paragraph of section 2.5.

15) s4.1.2: "The above requirement" is a little vague is that the MUST immediately preceding the last paragraph? Maybe:

The requirement to generate an empty trailer prevents .....

Fixed as part of #551

16) s4.1.3: Pseudo-code needs error conditions for handing the MUST NOTs ;)

Pseudo-code relies on pseudo-magic exception handling, though one check was added for #551

17) s4.2.2: r/incorrect/non-standard or non-conformant

Fixed, see [2599].

18) s5.3: Is the origin-form before the 2nd paragraph supposed to be there? Oh wait those are supposed to be subsections? Can't you just call it 5.3.1 origin-form, 5.3.2 absolute-form, etc.?

Fixed, see [2600].

19) s5.7.2: might be worth putting a reference in to Part6 after the first use of non-transform cache-control ... granted I did figure out where it was defined after remembering the

Fixed, see [2552]

1A) s5.7.2 and s2.3: s2.3 mentions privacy proxies and s5.7.2 says the following about proxies without qualifying the type of proxy:

A proxy MUST NOT modify header fields that provide information about the end points of the communication chain, the resource state, or the selected representation.

So does that essentially mean privacy filters proxies are non-conformant?

See #552

1B) s6.7: OPTIONAL?:

its acceptance
and use by the server is optional

Fixed by moving the factual statement up to definition and removing this bit, see [2603].

1C) Seems like you should just provide the form. I'm wondering whether the POC includes an actually method of contact or not? Having seen this done in the past, it's probably worth being pedantic and saying that they can change the registration but they need to tell IANA they're doing so.

I believe this was dropped after discussion with Julian.

Stephen Farrell

Discuss (2013-12-19)

There was originally supposed to be a separate deliverable to describe the security properties of HTTP, but that's not happening. I think its fair to say that the security considerations here (or across the entire set) don't really do all of that as well. I think that does leave a gap. However, I'm not sure what to do about that, since I don't believe there's any real chance of getting anyone to address this gap - its been tried and apparently failed, and with lots of security work in HTTP/2.0, its extremely unlikely that a victim will be found for this un-fun task.

That said, I do think it'd be worthwhile if the authors made an attempt to fill that gap by spending some cycles on finding a good set of references to HTTP security topics and adding those to the security considerations sections of p1 and/or p2.

Now, I'm sure that the authors won't want to do that (who ever wants to do a state-of-the-art study? even a tiny one like this) so the point I want to DISCUSS with the IESG initially and then with the chair and authors is whether or not that's a reasonable ask. (So, authors, no need to chime in just yet.)

See #549

Comment (2014-01-07)

p16 says you additionally define an absolute-path but the "it" in "in that it allows" is ambiguous - do you mean the additional thing allows or that 3986 allows but the additional thing doesn't? (I think the latter, am I right?:-)

Fixed, see [2604]

2.7.3 doesn't say whether http://example.com/home.html is the same as or different from http://example.com/HOME.html. Wouldn't it be good to explicitly tell people that you're not saying they are the same, but that in some cases they might be treated as being the same? That is, I don't get why its better to just make that implicit. If the reason is just that this is a rathole you wanted to avoid, I'll buy that.

It says "all other components are compared in a case-sensitive manner" (i.e., they are not considered to be the same). 3.1.1: did 2.5 say "unbounded length"? I don't recall it saying that, maybe what you mean is "longer than is supported"?

Fixed, see [2605].

3.2.1: nit: this mentions the "core standard" - I wondered if you meant p1 only or the whole set of RFCs we're reviewing now.

Changed to "outside this document set", see [2606]

3.2.2 says a server MUST not interpret a request until all headers are received - does that also mean that a server MUST NOT barf on a bad request until the end of the headers? That'd seem wrong, or does "interpret" not include the server concluding that the request is really crap?

Fixed, see [2607].

3.3: I just didn't get the last para, not sure if that implies its unclear or I didn't read carefully enough:-)

It is complicated due to the ad hoc nature of early HTTP design and the later addition of persistent connections.

5.4: I like the Host header being a MUST but would be a bit sad if I have to type that when I telnet to port 80 to check a server. You probably don't want to, but I'd be happy if you added some kind of text saying that servers might want to keep allowing that exceptional case.

This requirement was mandated by the IESG in 1995, IIRC, to ensure deployment of Host. We can't change it without changing HTTP-version.

5.7.1: I should check the syntax of "token" but is the SHOULD NOT in the last para here something you can do algorithmically or would you likely need a list of pseudonyms? If the latter, that might be worth noting.

The expectation is manual configuration, unrelated to the protocol.

section 7: I didn't get that one either at first glance, but then I did:-) Could do with some more text saying why it there if you had some or an example.

There are some examples where it is used, like in Upgrade (just above that section).

Ted Lemon

Discuss (2013-12-18)

Point 1:

In 2.7.1, end of last paragraph:

Before making use of an "http" URI
reference received from an untrusted source, a recipient ought to
parse for userinfo and treat its presence as an error; it is likely
being used to obscure the authority for the sake of phishing attacks.

Why no normative language here? I'm assuming this was deliberate, but it seems like the wrong call. Why not propose that the recipient reject this out of hand, unless there's some strong reason not to? I expect that you will explain why you made this decision and it will make sense to me, in which case that will resolve this DISCUSS point; otherwise, changing "ought to" to "SHOULD" would also satisfy. The referenced section of RFC 3986 has some good text on why this is important, but this document doesn't repeat much of it, so I'm concerned that a new reader wouldn't really get the significance of this advice.

Changed to a SHOULD in [2582].

Point 2:

In 5.5, suppose I connect to foo.example.org on port 80, and send the following:

GET / HTTP/1.1 Host: foo.example.org:8080

This produces an effective URI of http://foo.example.org:8080/. What is the server supposed to do at this point? The obvious way to resolve this DISCUSS point is to update the text to address this problem. I think this example has the same property that leads you to require a 301 or 400 status in section 3.1.1.

See #550

Point 3:

In 3.2.4, paragraph 1:

server MUST reject any received request message that contains
whitespace between a header field-name and colon with a response code
of 400 (Bad Request).  A proxy MUST remove any such whitespace from a
response message before forwarding the message downstream.

Why the different handling in the two cases? Is it really less bad (and hence salvageable in the response? What if a user agent receives such whitespace? I expect you'll address this point by explaining why this is an issue in requests and not in responses, or else by at least adding text about how user agents should deal with this situation. I am asking this question based on the inconsistency I see here, not any special insight I have into the problem, so I'm assuming there's a straightforward explanation.

The asymmetry of the requirements is more about what's possible in a request-response protocol. If a client sends a bad request, the server can refuse it (and that's our requirement); however, if we prohibit a client from using a response that has whitespace in the header, we'll likely get ignored by UA implementers, since this would break existing content (badly), and they can safely handle it.

The problem we're trying to address is when intermediaries handle this extra whitespace in a manner that's different to UAs; by having proxies strip it, we disrupt the attack channel. UAs don't have to handle the whitespace specially.

Comment (2013-12-18)

In 2.7.3:

Characters other
than those in the "reserved" set are equivalent to their percent-
encoded octets (see [RFC3986], Section 2.1): the normal form is to
not encode them.

There's no explicit reference for the definition of a reserved set; this could be easily fixed thusly:

Characters other
than those in the "reserved" set  (see [RFC3986], Section 2.2)
are equivalent to their percent-
encoded octets (see [RFC3986], Section 2.1): the normal form is to
not encode them.

Given that they follow each other, the reader will probably find the information either way, but it might be better to include both references.

fixed in [2550]

Section 3, Page 20, second paragraph:

A recipient MUST parse an HTTP message as a sequence of octets in an
encoding that is a superset of US-ASCII [USASCII].  Parsing an HTTP
message as a stream of Unicode characters, without regard for the
specific encoding, creates security vulnerabilities due to the
varying ways that string processing libraries handle invalid
multibyte character sequences that contain the octet LF (%x0A).

I don't understand what this means. I think I can guess what it means, but that's probably dangerous. What I think it means is that my reader should process the stream as UTF-8, storing it in a normalized Unicode format, either failing to process the request or doing something "sensible" when bad UTF-8 sequences are encountered, and the normalized Unicode should then be passed to the parser that parses the header lines. Is that roughly what's meant? If so, I think it could be more clearly stated.

The known concern is parsers that attempt to parse HTTP as a UTF-16 string, which resulted in at least one security hole within Java. It is stated more generally because the same might be true of other non-octet character parsing algorithms. The list rule exception mentioned in section 1.2 confused the hell out of me until I got to section 7. Why is section 7 not a subsection of section 2? I assume the answer is "because it's long, and would suck the wind out of the document if it were at the beginning," which is fine, but if so, it would be nice if the text in 1.2 did a bit more foreshadowing. E.g.:

This specification uses the Augmented Backus-Naur Form (ABNF)
notation of [RFC5234] with an extension defined in
Section 7 that adds compact support for comma-separated lists
with the addition of a # token to the usual ABNF token set, similar
to the * token.  Appendix B shows the collected ABNF with the list
rule expanded.

Fixed, see #542

BTW, none of the hassling I have done here in these DISCUSSes and comments should be construed as a lack of enthusiasm for this document. It's really obvious that a lot of care went into this document—I'm seeing all kinds of really good advice based on practice in terms of how not to do an implementation that will be vulnerable to a variety of issues. I am very enthusiastic about this document. Thank you very much for doing it.

Thanks for all of the detailed reviews.

Reported by julian.reschke@gmx.de; manually migrated from https://trac.ietf.org/trac/httpbis/ticket/531

httpwg / httpbis-issues