python-hyper / h11

A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
https://h11.readthedocs.io/
MIT License
490 stars 62 forks source link

Transfer-Encoding and Content-Length #90

Open pgjones opened 4 years ago

pgjones commented 4 years ago

Currently h11 accepts a message with both Content-Length and Transfer-Encoding with the latter taking precedence. However RFC 7230 states A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.. This seems to be an intentional change from RFC 2616 (Bogus Content-Length header fields are now required to be handled as errors by recipients. (Section 3.3.2)).

Would you accept a PR to change the handling from 2616 to 7230 i.e. raise a ProtocolError if both are present?

sethmlarson commented 4 years ago

Sounds like a good course of action to me. What do you think @njsmith?

njsmith commented 4 years ago

Hmm. This is a tricky issue. I think your interpretation of the RFC isn't quite right, and then separately there's a question of what h11 should do, so I'll discuss those separately.

What does RFC 7230 actually say?

A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.

This is unambiguous about what senders have to do, but technically it doesn't say anything about what receivers should do. And RFC 7230 often makes different rules for the two. For an especially egregious example, consider request targets. They have both an "origin form" (GET /index.html) and an "absolute form" (GET https://example.com/index.html). According to the RFC, clients MUST NOT send the absolute form to origin servers, but origin servers MUST support it anyway! So we can't necessarily say that RFC 7230 requires receivers to reject this, at least based on this part of text.

What requirements does the RFC impose on receivers? There's this bit you quote:

Bogus Content-Length header fields are now required to be handled as errors by recipients. (Section 3.3.2)

...but I don't think that helps us here. This is from a section that's summarizing changes from RFC 2616, so it's not the actual standard; it's just referring to text elsewhere in the RFC for the details. And I think specifically it's referring to this bit from 3.3.3:

4.. If a message is received without Transfer-Encoding and with either multiple Content-Length header fields having differing field-values or a single Content-Length header field having an invalid value, then the message framing is invalid and the recipient MUST treat it as an unrecoverable error

This is one of the few places where RFC 7230 uses that unambiguous "MUST treat it as an unrecoverable error" language. But notice that it specifically goes out of its way to not forbid a Transfer-Encoding: + invalid Content-Length.

The key section here in general is 3.3.3, which is worth reading carefully in its entirety. Basically h11's message framing handling right now is a strict translation of the algorithm given in 3.3.3 into code. And in particular, that section is clear that Transfer-Encoding takes precedence over Content-Length.

But it still manages to leave a fair amount of wiggle room, including this very frustrating paragraph:

If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 9.5) or response splitting (Section 9.4) and ought to be handled as an error. A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.

So it says you "ought" to "handle it as an error", AND ALSO gives instructions on how to handle it as not-an-error. And I have no idea what "ought" or "handled as an error" mean in RFC-ese; they're undefined terms. Fabulous.

So... I think that's all we're getting from RFC 7230.

What should h11 do?

The current behavior is definitely defensible within the RFC. But the RFC isn't totally clear, plus we sometimes intentionally deviate from the RFC when it makes sense, so that doesn't settle things.

I can think of three options that are worth considering (am I missing any?):

  1. What we do now: if Transfer-Encoding is present, ignore Content-Length
  2. If Transfer-Encoding and Content-Length are both present, blow up with a ProtocolError
  3. If Transfer-Encoding and Content-Length are both present, then silently discard the Content-Length header (as in: "A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.")

The trade-off here is that if we're more restrictive, then we're more robust against smuggling/splitting attacks, but we might be less interoperable with other implementations.

Phil found an example of haproxy being vulnerable to smuggling attacks here, but that was fixed a while ago. I'm not sure how serious a concern this is currently; if everyone implements the RFC properly, then this isn't an issue, so the question is how well other code implements it, which is something I don't know.

That's also the issue with interoperability... before making any changes, it would be good to check how some other major HTTP implementations handle this case, like browsers, curl, go's net/http. (My guess is that they all act like h11 does now, i.e. silently ignore the Content-Length, but I don't know.)