python-hyper / hyper

HTTP/2 for Python.
http://hyper.rtfd.org/en/latest/
MIT License
1.05k stars 191 forks source link

HTTPHeaderMap splitting on commas can lead to unpredictable request headers #389

Open jmaroeder opened 5 years ago

jmaroeder commented 5 years ago

(Related to #314, but impacts the request side of things)

Because of the way HTTPHeaderMap splits header values on commas into multiple ways, servers may have trouble understanding headers with multiple values. Example:

import hyper
conn = hyper.HTTPConnection('nghttp2.org', 443)
headers = {
  'accept': '*/*',
  'accept-encoding': 'gzip, deflate, br',
}
conn.request('GET', '/httpbin/headers', headers=headers)
resp = conn.get_response()
print(resp.read().decode())

Output:

{"headers":{"Accept":"*/*","Accept-Encoding":"gzip,deflate,br","Host":"nghttp2.org","Via":"2 nghttpx"}}

Expected output (note the spaces in Accept-Encoding):

{"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate, br","Host":"nghttp2.org","Via":"2 nghttpx"}}

The following curl command retrieves the expected output:

curl 'https://nghttp2.org/httpbin/headers' \
  -H 'accept: */*' \
  -H 'accept-encoding: gzip, deflate, br' \
  -H 'User-Agent:' \
  --http2

This is a minimal example, but I have run into situations where the server (outside of our control) expects the headers to be in a very specific format.

gzzo commented 5 years ago

Agreed, this is unacceptable. A more common scenario is using the User-Agent header (since a lot of servers will perform filtering on this header), and getting your requests denied because they don't know how to deal with multiple headers with the same key.

For anyone looking for a quick fix, you can perform this monkey patch:

import hyper.common.headers

def hyper_monkey(k, v):
    yield k, v

hyper.common.headers.canonical_form = hyper_monkey