Closed tomchristie closed 4 years ago
Merging #103 into master will increase coverage by
0.01%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #103 +/- ##
==========================================
+ Coverage 99.14% 99.15% +0.01%
==========================================
Files 21 21
Lines 937 950 +13
Branches 173 173
==========================================
+ Hits 929 942 +13
Misses 7 7
Partials 1 1
Impacted Files | Coverage Δ | |
---|---|---|
h11/tests/test_connection.py | 100.00% <ø> (ø) |
|
h11/tests/test_io.py | 100.00% <ø> (ø) |
|
h11/_connection.py | 100.00% <100.00%> (ø) |
|
h11/_headers.py | 100.00% <100.00%> (ø) |
|
h11/_writers.py | 88.57% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1540843...a0eaaf8. Read the comment docs.
Just as a comment or maybe for additional inspiration: This is how we solved a similar problem in the mitmproxy project: https://github.com/mitmproxy/mitmproxy/blob/master/mitmproxy/net/http/headers.py
Okay, based on the feedback here's a slight retake on this, which I think exposes a neater interface and has a pretty minimal change footprint.
After this pull request...
h11
preserves header casing on the wire.event.headers
changes marginally in that it exposes a list of Header
instances, which are tuple-like (name, value) byte pairs, rather than actual tuples.Header
items can also be accessed explicitly, with header.name
, header.value
. For use-cases where you need the original case-sensitive naming, header.raw_name
is also available.Internally...
set_comma_header
no longer "expects name to be lowercase bytes". It keeps the same case-insensitive behaviour, while preserving whatever casing information is passed.get_comma_header
no longer "expects name to be lowercase bytes". It keeps the same case-insensitive behaviour.set_comma_header
/get_comma_header
.What do you get from PYTHONPATH=. python bench/benchmarks/benchmarks.py
before and after this change?
Ah thanks I'd just been looking into that, but I'd not seen the benchmark.
Just as a rough guideline, I've taken the following to get ideas of comparative performances of the different approaches here...
import h11
import timeit
def send_request():
conn = h11.Connection(our_role=h11.CLIENT)
headers = [
(b'Accept', b'*/*'),
(b'Accept-Encoding', b'gzip, deflate'),
(b'Connection', b'keep-alive'),
(b'Host', b'www.example.org'),
(b'User-Agent', b'HTTPie/2.2.0'),
]
request = h11.Request(method="GET", target="/", headers=headers)
bytes_to_send = conn.send(request)
print(timeit.timeit(send_request, number=100000))
Which comes out with...
.raw_items
interface: ~4.1 seconds.raw_name
interface: ~4.5 secondsI'll take a look at the proper benchmarks now...
Existing...
$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
7389.5 requests/sec
7457.9 requests/sec
7451.1 requests/sec
7445.4 requests/sec
7434.2 requests/sec
7428.8 requests/sec
7447.1 requests/sec
Headers
as a sequence...
$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
6393.3 requests/sec
6404.6 requests/sec
6369.1 requests/sec
6346.9 requests/sec
6372.4 requests/sec
6388.1 requests/sec
6403.9 requests/sec
Header
instances...
$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
5851.9 requests/sec
5897.9 requests/sec
5894.1 requests/sec
5873.7 requests/sec
5875.9 requests/sec
5871.4 requests/sec
5903.8 requests/sec
The benchmark I cited isn't terribly clever, but it does exercise a full request/response cycle with some realistic headers.
Closing this off in favour of #104
A proposal for preserving header casing information in both directions, while still exposing lower-case only to users by default, and keeping careful type separation.
Essentially a take on @Lukasa's comment here https://github.com/python-hyper/h11/issues/31#issuecomment-281013856 and @njsmith's follow up...
Here's how it looks to the end-user...
h11
now preserves header casing on sending headers..headers
becomes aSequence
type, rather than a raw list. Iterating over the sequence continues to return(<lowercased-name>, <value>)
pairs..headers.raw()
is available for usages such as console or debug output that require original casing information, and returns a list of(<lowercased-name>, <raw-name>, <value>)
three-tuples.I've addressed this commit-by-commit, which should help make the approach I've taken here clear...
Sequence
type, rather than a raw mutable list, but continues to store exactly the same information.Headers
type, but don't use it anywhere.get_comma_header
,set_comma_header
. Both functions continue to be case insensitive in their effects, but it will matter in theset_comma_header
case because it'll give us nice header casing on the over-the-wire bytes. Switching both over within the codebase for consistency.Strictly speaking there is an API change here, in that
.headers
on events are now sequences, rather than plain lists. Any user code that is doing grungy stuff by mutating that data-structure in-place, wouldn't function after this. But that's a bit of a hacky broken thing to be doing anyway, so a version bump that tightened up the API spec into ".headers
is an immutable sequence" seems reasonable enough right?Anyway's putting this out there, so we've got something to discuss. 🤔
Thanks so much for maintaining such a fantastically careful & thoroughly designed library. It's a joy to work with. ✨