python-hyper / h11

A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
https://h11.readthedocs.io/
MIT License
490 stars 62 forks source link

Update header value validation to match WHAT-WG fetch spec #97

Open njsmith opened 4 years ago

njsmith commented 4 years ago

Header values are a mess. Supposedly they're defined by RFC 7230, but in fact it has a bug and its definition is obviously wrong. And, in practice, implementations are substantially more lax than RFC 7230, even after you fix the obvious bug.

In #57/#68, we adjusted our validation rule to allow more characters, based on some intuition and a small amount of new data (e.g. we allow \x01, which is used by google analytics cookies, but still disallow \x00).

But, it turns out that the WHAT-WG fetch spec has an actual precise definition for header values: https://fetch.spec.whatwg.org/#concept-header-value

Weird that it's here instead of in some HTTP spec, but I'll take it.

I think there are two differences between what h11 does currently and the WHAT-WG spec:

We should probably switch to matching the WHAT-WG behavior exactly.

SyntaxColoring commented 4 years ago

@njsmith Out of curiosity, what exactly is the bug in the RFC 7230 definition, and why is the definition obviously wrong?

njsmith commented 4 years ago

The spec accidentally disallows any header value that contains a single character word inside it. For example, this is not a valid header would be an illegal header value, because the word a is only one character long.

mnot commented 1 year ago

RFC7230 is obsolete; the specification you want is here.

Regarding single word field values -- how do you come to that conclusion?