httpwg / httpbis-issues

1 stars 1 forks source link

p2 editorial feedback 2 #426

Closed mnot closed 4 years ago

mnot commented 11 years ago

3.1. Representation Metadata

| Expires | Section 7.3 of [Part6] |

If "Expires" is considered "representation metadata", then it seems like "ETag" and "Last-Modified" should be as well. But I think it would make more sense to just remove "Expires" from the list; it's clearly the odd man out here.

3.1.1.2. Character Encodings (charset)

Implementers need to be aware of IETF character set requirements [RFC3629] [RFC2277].

It's not clear what requirements this is referring to; RFC 2277 places requirements on protocol authors, not on implementors, and RFC 3629 is just the definition of UTF-8. If the requirement is "implementations MUST support UTF-8" then we should say that.

3.1.1.4. Multipart Types

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. HTTP does not use the multipart boundary as an indicator of message body length. In all other respects, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type.

That last part seems completely wrong; a web browser is not expected to handle multipart/alternative or multipart/related in the way a mail reader would. (This requirement came from RFC 2616, but... it was wrong then too.)

The MIME header fields within each body-part of a multipart message body do not have any significance to HTTP beyond that defined by their MIME semantics.

This is not true of multipart/byteranges; in RFC 2616 that was explained separately, but that explanation got lost in httpbis rewrites at some point.

Suggested rewrite for the second and third paragraphs:

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. The one exception is the "multipart/byteranges" type (Appendix A of [Part5]) when it appears in a 206 (Partial Content) response. In all other cases, the MIME header fields within each body-part of a multipart message body do not have any significance at the HTTP level; they are just part of the representation data.

(This drops the newly-added "HTTP does not use the multipart boundary as an indicator of message body length", but that is already implied by the removal of 2616's prohibition on epilogue data; if the multipart is allowed to have an epilogue, then the final boundary doesn't indicate the end of the body anyway. It also drops the "unrecognized multipart subtype" text, which was already irrelevant given the "strictly as payload" rule anyway.)

3.1.3.1. Language Tags

In summary, a language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags:

 language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

Kinda weird... the text sets you up to expect an actual grammar for language-tag, but then you just get a cross-reference. I'd rearrange stuff to:

... HTTP uses language tags within the Accept-Language and Content-Language fields.

 language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

A language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags. White space is not allowed within the tag and all tags are case-insensitive. Example tags include:

 en, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN

See [RFC5646] for further information.

(also dropping the language-subtag-registry ref, since that's covered by the "See [RFC5646]")

3.4. Content Negotiation

(such as when many different formats are supported by a user-agent),

no hyphen

3.4.1. Proactive Negotiation

If the selection of the best representation for a response is made by an algorithm located at the server, it is called proactive negotiation.

That text doesn't motivate the new name. How about:

If the selection of the best representation for a response is made by the server based on preferences indicated by the user agent in its initial request for the resource, it is called proactive negotiation.

  1. It might limit a public cache's ability to use the same response for multiple user's requests.

users' not user's

For example, the origin server might not implement proactive negotiation, or it might decide that sending a response that doesn't conform to them is better than sending a 406 (Not Acceptable) response.

Not clear what "them" is. "...that doesn't conform to the user agent's preferences..."

3.4.2. Reactive Negotiation

This specification defines the 300 (Multiple Choices) and 406 (Not Acceptable) status codes for enabling reactive negotiation when the server is unwilling or unable to provide a varying response using proactive negotiation.

406 doesn't really "enable reactive negotiation". It just fails to do proactive negotiation.

Also, should we mention how reactive negotiation is actually done?

This specification defines the 300 (Multiple Choices) status code for enabling reactive negotiation. However, in practice, Web sites wanting to do reactive negotiation will just return a successful response containing a "default" (or proactively negotiated) representation of the resource, which includes within it links that the user can follow to reach other representations.

  1. Product Tokens

    By convention, the products are listed in order of their significance for identifying the application.

"...in decreasing order of...", or something like that. (likewise in the description of User-Agent in 6.5.3 and Server in 8.4.2)

5.2.2. Idempotent Methods

Section 6.2.2.1 of Part1 implies that the concept of "idempotent sequences of request methods" (as opposed to merely "idempotent methods") will be discussed here, but it's not. I'm not sure if it should be added here or there.

5.3.1. GET

The semantics of the GET method change to a "partial GET" if the request message includes a Range header field ([Part5]).

"a Range or If-Range header field"

5.3.6. CONNECT

Though obvious, it seems like for consistency's sake, this should end with:

Responses to the CONNECT method are not cacheable.

5.3.7. OPTIONS

If no payload body is included, the response MUST include a Content-Length field with a field-value of "0".

Does this actually mean to prohibit servers from using chunked encoding (or "Connection: close" with no Content-Length) in that case? Or is it just supposed to be a reminder that "empty message body" is different from "no message body"?

(Section 9.1.2 has basically the same text.)

If no Max-Forwards field is present in the request, then the forwarded request MUST NOT include a Max-Forwards field.

"If no Max-Forwards field is present in the upstream request, then the downstream request MUST NOT include a Max-Forwards field."

6.2. Conditionals

The HTTP/1.1 conditional request mechanisms are defined in [Part4].

"and [Part5]" (If-Range)

6.3. Content Negotiation

6.1 and 6.2 had some introductory text before the table, and it seems weird to not have that here.

(6.4 and 6.5 have the same problem)

6.3.1. Quality Values

Should this section be called "Weight" now?

6.3.5. Accept-Language

would mean: "I prefer Danish, but will accept British English and other types of English". (see also Section 2.3 of [RFC4647])

Capitalize "See"

  1. Response Status Codes

    The status-code element is a 3-digit integer result code of the attempt to understand and satisfy the request.

"...a 3-digit integer code giving the result of the attempt..."

o 2xx (Successful): The action was successfully received, understood, and accepted

"The request was successfully..."

7.1. Overview of Status Codes

The reason phrases listed here are only recommendations -- they can be replaced by local equivalents without affecting the protocol.

That suggests you can/should translate them into other languages, which isn't really what they're for and kind of contradicts p1 3.1.2's "A client SHOULD ignore the reason-phrase content."

| 415 | Unsupported Media Type | Section 7.5.13 | | 416 | Requested range not | Section 3.2 of | | | satisfiable | [Part5] | | 417 | Expectation Failed | Section 7.5.14 |

The capitalization of "Requested range not satisfiable" is inconsistent with the rest of the table.

7.2. Informational 1xx

A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect a 100 (Continue) status message.

No reason to call out 100 Continue specifically here... "A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect one."

7.3.2. 201 Created

If the newly created resource's URI is the same as the Effective Request URI, this information can be omitted

"effective request URI" is not capitalized like that anywhere else. (Well, except for once more later on in this section which should also be fixed.)

If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.

"with a 202 (Accepted) response"

8.1.1.2. Date

  1. If the response status code is 100 (Continue) or 101 (Switching Protocols), the response MAY include a Date header field, at the server's option.

Is that really supposed to be limited to 100 and 101, and not other 1xx codes?

8.1.3. Retry-After

This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked to wait

No hyphen in "user agent"

8.4.1. Allow

 Allow = #method

Should that be 1#method? If not, it should explain what an empty "Allow" header means.

9.1.1. Procedure

HTTP method registrations MUST include the following fields:

Should "cacheability" be an explicit field (rather than just a required part of the specification text)?

9.3. Header Field Registry

It seems weird to have this in p2 since p1 defines headers too...

9.3.1. Considerations for New Header Fields

o Whether it is appropriate to list the field-name in the Connection header field (i.e., if the header field is to be hop-by-hop, see Section 6.1 of [Part1]).

should have a semicolon rather than comma after "hop-by-hop". (So that it doesn't read like it's telling you to only follow the xref if the header field is hop-by-hop.)

10.1. Transfer of Sensitive Information

Four header fields are worth special mention in this context: Server, Via, Referer and From.

"Via" is in p1 though, so the Via bits should be moved to p1's Security Considerations? (Or maybe if we end up with a p0, all of the security considerations should be consolidated there.)

The information sent in the From field might conflict with the user's privacy interests or their site's security policy, and hence it SHOULD NOT be transmitted without the user being able to disable, enable, and modify the contents of the field. The user MUST be able to set the contents of this field within a user preference or application defaults configuration.

Do any browsers actually ever send the "From" header? If not, should we just say "From is for robots, not browsers"?

Appendix C. Changes from RFC 2616

Remove base URI setting semantics for "Content-Location" due to poor implementation support, which was caused by too many broken servers emitting bogus Content-Location header fields, and also the potentially undesirable effect of potentially breaking relative links in content-negotiated resources. (Section 3.1.4.2)

That would parse better if the "which was..." clause was parenthesized rather than just set off by commas.

Failed to consider that there are many other request methods that are safe to automatically redirect, and further that the user agent is able to make that determination based on the request method semantics.

This is written in the opposite style from the rest of the list (it describes the problem with 2616 rather than the solution in httpbis). Should be something like:

Allow automatic redirection of all "safe" methods, not just GET and HEAD, and give the user agent more latitude in redirecting unsafe methods. (Section 7.4)

Reported by @mnot, migrated from https://trac.ietf.org/trac/httpbis/ticket/426

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2113:

(editorial) make section on language tags more concise, since we already delegate the definition to RFC5646; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2114:

(editorial) improve description of 300 and 406 in reactive negotiation; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2115:

(editorial) product tokens listed in decreasng order; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

3.1. Representation Metadata

Expires | Section 7.3 of [Part6] |

If "Expires" is considered "representation metadata", then it seems like "ETag" and "Last-Modified" should be as well. But I think it would make more sense to just remove "Expires" from the list; it's clearly the odd man out here.

Moved to control data in 2092.

3.1.1.2. Character Encodings (charset)

Implementers need to be aware of IETF character set requirements [RFC3629] [RFC2277].

It's not clear what requirements this is referring to; RFC 2277 places requirements on protocol authors, not on implementors, and RFC 3629 is just the definition of UTF-8. If the requirement is "implementations MUST support UTF-8" then we should say that.

Removed in 1975.

3.1.1.4. Multipart Types

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. HTTP does not use the multipart boundary as an indicator of message body length. In all other respects, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type.

That last part seems completely wrong; a web browser is not expected to handle multipart/alternative or multipart/related in the way a mail reader would. (This requirement came from RFC 2616, but... it was wrong then too.)

It was right back in the days of Mosaic for X. It isn't implemented by browsers today. Removed in 2050.

The MIME header fields within each body-part of a multipart message body do not have any significance to HTTP beyond that defined by their MIME semantics.

This is not true of multipart/byteranges; in RFC 2616 that was explained separately, but that explanation got lost in httpbis rewrites at some point.

Suggested rewrite for the second and third paragraphs:

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. The one exception is the "multipart/byteranges" type (Appendix A of [Part5]) when it appears in a 206 (Partial Content) response. In all other cases, the MIME header fields within each body-part of a multipart message body do not have any significance at the HTTP level; they are just part of the representation data.

(This drops the newly-added "HTTP does not use the multipart boundary as an indicator of message body length", but that is already implied by the removal of 2616's prohibition on epilogue data; if the multipart is allowed to have an epilogue, then the final boundary doesn't indicate the end of the body anyway. It also drops the "unrecognized multipart subtype" text, which was already irrelevant given the "strictly as payload" rule anyway.)

A similar rewrite was done in 2050.

3.1.3.1. Language Tags

In summary, a language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags:

language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

Kinda weird... the text sets you up to expect an actual grammar for language-tag, but then you just get a cross-reference. I'd rearrange stuff to:

... HTTP uses language tags within the Accept-Language and Content-Language fields.

language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

A language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags. White space is not allowed within the tag and all tags are case-insensitive. Example tags include:

en, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN

See [RFC5646] for further information.

(also dropping the language-subtag-registry ref, since that's covered by the "See [RFC5646]")

Done in 2113.

3.4. Content Negotiation

(such as when many different formats are supported by a user-agent),

no hyphen

Fixed already (and then rewritten later in 2050).

3.4.1. Proactive Negotiation

If the selection of the best representation for a response is made by an algorithm located at the server, it is called proactive negotiation.

That text doesn't motivate the new name. How about:

If the selection of the best representation for a response is made by the server based on preferences indicated by the user agent in its initial request for the resource, it is called proactive negotiation.

Rewritten in 2050.

  1. It might limit a public cache's ability to use the same response for multiple user's requests.

users' not user's

Rewritten in 2050.

For example, the origin server might not implement proactive negotiation, or it might decide that sending a response that doesn't conform to them is better than sending a 406 (Not Acceptable) response.

Not clear what "them" is. "...that doesn't conform to the user agent's preferences..."

Done in 2050.

3.4.2. Reactive Negotiation

This specification defines the 300 (Multiple Choices) and 406 (Not Acceptable) status codes for enabling reactive negotiation when the server is unwilling or unable to provide a varying response using proactive negotiation.

406 doesn't really "enable reactive negotiation". It just fails to do proactive negotiation.

Fixed in 2114.

Also, should we mention how reactive negotiation is actually done?

This specification defines the 300 (Multiple Choices) status code for enabling reactive negotiation. However, in practice, Web sites wanting to do reactive negotiation will just return a successful response containing a "default" (or proactively negotiated) representation of the resource, which includes within it links that the user can follow to reach other representations.

I have mentioned other patterns in the parent section and within the 300 code.

  1. Product Tokens

    By convention, the products are listed in order of their significance for identifying the application.

"...in decreasing order of...", or something like that. (likewise in the description of User-Agent in 6.5.3 and Server in 8.4.2)

Fixed in 2115.

... more later ...

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2116:

rewrite the sections on retrying requests and pipelining to resolve nonsense about non-idempotent sequences; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2118:

reorder paragraphs in method descriptions for consistency; note that CONNECT is not cacheable; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2119:

Accept-Language: clean up prose and note descending order of priority for equal weights (as defined in RFC4647 and original HTTP); partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2120:

(editorial) add section intros; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

5.2.2. Idempotent Methods

Section 6.2.2.1 of Part1 implies that the concept of "idempotent sequences of request methods" (as opposed to merely "idempotent methods") will be discussed here, but it's not. I'm not sure if it should be added here or there.

Rewritten there in p1 2116.

5.3.1. GET

The semantics of the GET method change to a "partial GET" if the request message includes a Range header field ([Part5]).

"a Range or If-Range header field"

No, If-Range has no meaning without Range.

5.3.6. CONNECT

Though obvious, it seems like for consistency's sake, this should end with:

Responses to the CONNECT method are not cacheable.

sigh 2118.

5.3.7. OPTIONS

If no payload body is included, the response MUST include a Content-Length field with a field-value of "0".

Does this actually mean to prohibit servers from using chunked encoding (or "Connection: close" with no Content-Length) in that case? Or is it just supposed to be a reminder that "empty message body" is different from "no message body"?

(Section 9.1.2 has basically the same text.)

Yes, they were designed to require a specific indicator of no body for the sake of persistent connections.

If no Max-Forwards field is present in the request, then the forwarded request MUST NOT include a Max-Forwards field.

"If no Max-Forwards field is present in the upstream request, then the downstream request MUST NOT include a Max-Forwards field."

Already rephrased in 2064.

6.2. Conditionals

The HTTP/1.1 conditional request mechanisms are defined in [Part4].

"and [Part5]" (If-Range)

That is noted in Part4.

6.3. Content Negotiation

6.1 and 6.2 had some introductory text before the table, and it seems weird to not have that here.

(6.4 and 6.5 have the same problem)

Fixed in prior edits and 2020.

6.3.1. Quality Values

Should this section be called "Weight" now?

I don't think so, mostly for historical reasons.

6.3.5. Accept-Language

would mean: "I prefer Danish, but will accept British English and other types of English". (see also Section 2.3 of [RFC4647])

Capitalize "See"

Led to a larger rewrite in 2119.

... more later ...

mnot commented 11 years ago

fielding@gbiv.com commented:

From 2122:

(editorial) explain empty Allow field for 405; misc typos; partly addresses #426

mnot commented 11 years ago

fielding@gbiv.com commented:

  1. Response Status Codes

    The status-code element is a 3-digit integer result code of the attempt to understand and satisfy the request.

"...a 3-digit integer code giving the result of the attempt..."

o 2xx (Successful): The action was successfully received, understood, and accepted

"The request was successfully..."

Both fixed by Julian 1964.

7.1. Overview of Status Codes

The reason phrases listed here are only recommendations -- they can be replaced by local equivalents without affecting the protocol.

That suggests you can/should translate them into other languages, which isn't really what they're for and kind of contradicts p1 3.1.2's "A client SHOULD ignore the reason-phrase content."

They can be (and often are) localized in practice. The client SHOULD ignore them, yes, but that doesn't mean servers don't have to respect local requirements regarding their own language use.

| 415 | Unsupported Media Type | Section 7.5.13 | | 416 | Requested range not | Section 3.2 of | | | satisfiable | [Part5] | | 417 | Expectation Failed | Section 7.5.14 |

The capitalization of "Requested range not satisfiable" is inconsistent with the rest of the table.

Fixed by Julian 1964. I've shortened it to Range Not Satisfiable.

7.2. Informational 1xx

A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect a 100 (Continue) status message.

No reason to call out 100 Continue specifically here... "A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect one."

Yep, fixed in 2122.

7.3.2. 201 Created

If the newly created resource's URI is the same as the Effective Request URI, this information can be omitted

"effective request URI" is not capitalized like that anywhere else. (Well, except for once more later on in this section which should also be fixed.)

Fixed in 2105.

If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.

"with a 202 (Accepted) response"

Fixed by Julian 1964.

8.1.1.2. Date

  1. If the response status code is 100 (Continue) or 101 (Switching Protocols), the response MAY include a Date header field, at the server's option.

Is that really supposed to be limited to 100 and 101, and not other 1xx codes?

No, already rewritten to fix that.

8.1.3. Retry-After

This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked to wait

No hyphen in "user agent"

Fixed by Julian.

8.4.1. Allow

 Allow = #method

Should that be 1#method? If not, it should explain what an empty "Allow" header means.

Yes, explained in 2122.

9.1.1. Procedure

HTTP method registrations MUST include the following fields:

Should "cacheability" be an explicit field (rather than just a required part of the specification text)?

We discussed this in another issue and decided that it was too complex an issue for a simple checkmark.

9.3. Header Field Registry

It seems weird to have this in p2 since p1 defines headers too...

A registry is primarily for linking from name to semantics.

9.3.1. Considerations for New Header Fields

o Whether it is appropriate to list the field-name in the Connection header field (i.e., if the header field is to be hop-by-hop, see Section 6.1 of [Part1]).

should have a semicolon rather than comma after "hop-by-hop". (So that it doesn't read like it's telling you to only follow the xref if the header field is hop-by-hop.)

Fixed by Julian 1964.

10.1. Transfer of Sensitive Information

Four header fields are worth special mention in this context: Server, Via, Referer and From.

"Via" is in p1 though, so the Via bits should be moved to p1's Security Considerations? (Or maybe if we end up with a p0, all of the security considerations should be consolidated there.)

I think it belongs here.

The information sent in the From field might conflict with the user's privacy interests or their site's security policy, and hence it SHOULD NOT be transmitted without the user being able to disable, enable, and modify the contents of the field. The user MUST be able to set the contents of this field within a user preference or application defaults configuration.

Do any browsers actually ever send the "From" header? If not, should we just say "From is for robots, not browsers"?

I rewrote this in 2054.

Appendix C. Changes from RFC 2616

Remove base URI setting semantics for "Content-Location" due to poor implementation support, which was caused by too many broken servers emitting bogus Content-Location header fields, and also the potentially undesirable effect of potentially breaking relative links in content-negotiated resources. (Section 3.1.4.2)

That would parse better if the "which was..." clause was parenthesized rather than just set off by commas.

Fixed by Julian and then rewritten again my me in 2083.

Failed to consider that there are many other request methods that are safe to automatically redirect, and further that the user agent is able to make that determination based on the request method semantics.

This is written in the opposite style from the rest of the list (it describes the problem with 2616 rather than the solution in httpbis). Should be something like:

Allow automatic redirection of all "safe" methods, not just GET and HEAD, and give the user agent more latitude in redirecting unsafe methods. (Section 7.4)

Rewritten in 2083.

mnot commented 11 years ago

Thanks for your detailed comments; all have been addressed or explained above.

mnot commented 11 years ago

fielding@gbiv.com changed milestone from unassigned to 22

mnot commented 11 years ago

@mnot changed summary from p2 editorial feedback to p2 editorial feedback 2