subconsciousnetwork / subtext

Markup for note taking
Apache License 2.0
271 stars 20 forks source link

Line folding #40

Closed gordonbrander closed 1 year ago

gordonbrander commented 1 year ago

This issue exists to document background research on header line folding, related to #19. No action is necessary.

Line folding

Email and HTTP header syntaxes are line-oriented, where each line is a header, and key and value are separated by :. Headers in email also support "line folding". Line folding means multi-line headers are allowed as long as the following line is preceded by at least one space.

Folded-Header: some value
  which continues on next line

HTTP also used to support line folding, but this was obsoleted in https://www.rfc-editor.org/rfc/rfc7230#section-3.2.4.

Benefits/costs of line folding

Line folding has some useful aspects.

The absolute best thing about YAML is the support for indented multi-line strings - so useful for embedding other languages inside of a YAML document, e.g. the giant hunk of JavaScript in my shots.yml file here (https://twitter.com/simonw/status/1501762626941947909)

However it comes with the cost of making parsing more complex. If parsers do not implement it, or implement it incorrectly, you end up with broken or malformed headers. This ended up happening in practice with HTTP, leading to the obsolescence of line-folding (see below).

Why was line folding obsoleted in http headers?

Some popular servers did not implement line-folding, or implemented it incorrectly, leading to complexities and issues. This practical issue lead to the obsoletion of line folding. Simplification of the spec was also cited as a reason.

Snipping a few relevant passages from the email archives and specs:

On 23 Aug 2015, at 5:44 am, Mike Bishop Michael.Bishop@microsoft.com wrote:

We’ve encountered a site (whom we shall leave nameless, unless they choose otherwise) that included a 0x0A20202020202020 sequence inside an HPACK value –carriage return followed by several spaces, which is to say line-folding. Edge rejects that (as will IIS if clients send it), because that’s an HTTP/1.1 legacy artifact. RFC 7540 just references RFC 7230 section 3.2 for header values, which in turn says in section 3.2.4:

Historically, HTTP header field values could be extended over multiple lines by preceding each extra line with at least one space or horizontal tab (obs-fold). This specification deprecates such line folding except within the message/http media type (Section 8.3.1). A sender MUST NOT generate a message that includes line folding (i.e., that has any field-value that contains a match to the obs-fold rule) unless the message is intended for packaging within the message/http media type.

While there’s no requirement for HTTP/1.1 clients to reject this pattern (for back-compat), it seems like an HTTP/2 implementation might want to hold its peers to that MUST NOT. Osama has additionally pointed out that being tolerant of this could make HTTP/1.1 to HTTP/2 conversion “wreak havoc and bugs.” In an offline thread with a few others, the general feeling seems to be that we should all change to reject this as broken header framing.

(Quoted in https://lists.w3.org/Archives/Public/ietf-http-wg/2015JulSep/0249.html)

Many implementations do not implement header line folding; should it be deprecated / removed? (https://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i77)

We deprecated folding, but couldn't require implementations to reject it, because (as you've seen) it still happens in the wild. (https://lists.w3.org/Archives/Public/ietf-http-wg/2015JulSep/0250.html)

gordonbrander commented 1 year ago

Considerations:

Design decisions: