Open michaelfm1211 opened 1 year ago
After the full test suite failed on my first PR for this issue (#105531), I looked into this a bit more. I think the change would be better as two PRs:
UnicodeEncodeError
by falling back to RFC 2047 encoded-word.http.client
to parse headers using the default email policy rather than email.policy.compat32
(which is described in more depth in issue #105622), or do it as a standalone change.This does not seem correct. Can you point to the modern standard from https://httpwg.org/specs/ (or even an old standard) that says that HTTP clients should encode headers like this, or that servers should decode them automatically?
HTTP headers have a few different "common" formats, but each HTTP/1.1 header really needs to be treated on a case-by-case basis as many have their own quirks. The only common encoding format I've seen and implemented for Werkzeug's header parsing is for dict-like headers: Header: key1*=UTF-8''%ab, key2*=...
. I would be very surprised if http.server
suddenly started returning pre-decoded UTF-8 data, especially for an old email format instead of what's commonly used in HTTP.
Bug report
When receiving HTTP headers in MIME encoded-word format (per RFC 2047), the
http
module does not decode the header's value out of encoded-word. For example:Additionally, when setting a header to a string containing a non-ISO-8859-1 character, a
UnicodeEncodeError
exception is thrown, however, this could be solved by just using MIME encoded-word. For example:Your environment
Linked PRs