Closed acfoltzer closed 2 years ago
Marking this as a draft temporarily, it turns out I was premature and we'll need this for request lines too :disappointed:
This is ready for review, but will depend on #111 to actually pass tests etc. It doesn't look like I can change the base branch of the PR to point to my fork, but once #111 is merged it should update automatically.
Sigh, the fuzzer found a bug with this PR after combining with #111. Back to draft for now.
I have rewritten this to eagerly check the flag rather than backtracking after an error, which fixes the bugs mentioned above and obviates the issue where the reason phrase would include leading spaces from the delimiter. Happily this seems to be a performance improvement rather than a regression even when the new flags are disabled (results vs 6f6ff10):
req/req time: [292.13 ns 295.81 ns 301.23 ns]
thrpt: [2.2261 GiB/s 2.2668 GiB/s 2.2954 GiB/s]
change:
time: [-13.390% -12.468% -11.159%] (p = 0.00 < 0.05)
thrpt: [+12.561% +14.244% +15.460%]
Performance has improved.
req_short/req_short time: [62.834 ns 62.992 ns 63.188 ns]
thrpt: [1.0023 GiB/s 1.0054 GiB/s 1.0079 GiB/s]
change:
time: [-11.112% -9.9009% -8.9416%] (p = 0.00 < 0.05)
thrpt: [+9.8196% +10.989% +12.501%]
Performance has improved.
resp/resp time: [303.51 ns 304.23 ns 305.34 ns]
thrpt: [2.1320 GiB/s 2.1398 GiB/s 2.1449 GiB/s]
change:
time: [-12.059% -11.512% -11.016%] (p = 0.00 < 0.05)
thrpt: [+12.379% +13.010% +13.713%]
Performance has improved.
resp_short/resp_short time: [67.521 ns 67.686 ns 67.929 ns]
thrpt: [1.2476 GiB/s 1.2521 GiB/s 1.2552 GiB/s]
change:
time: [-9.1562% -8.7366% -8.3331%] (p = 0.00 < 0.05)
thrpt: [+9.0906% +9.5730% +10.079%]
Performance has improved.
I haven't thrown it into godbolt or anything yet, but I suspect something about this change triggers a different inlining behavior compared to the baseline.
Is there any other parsing changes you expect, or shall we prep a crate release?
@seanmonstar we should be good to prep a release, thanks! I've been fuzzing this against our internal fork of picohttpparser and am not finding any other places for improvement, although I am excluding path parsing from the fuzzing for now. Paths are more difficult because both httparse and pico lean on other libraries to meaningfully interpret them, so future changes that come out of that work are more likely to land in http
anyway.
This broke some stuff: prior to this PR, there was no way to parse a request with a specific parser config, that allowed me to implement allow_spaces_after_header_name_in_responses
and allow_obsolete_multiline_headers_in_responses
only for responses without explicit checks.
Now that there is a ParserConfig::parse_request
method, those flags are applied to both requests and responses.
Depends on #111.
These new flags set whether multiple spaces are allowed as delimiters in request lines and response status lines.
The latest version of the HTTP/1.1 spec (request lines, response status lines) allows implementations to parse multiple whitespace characters in place of the
SP
delimiter in the response status line, including:This option relaxes the parsers to allow for multiple spaces, but does not allow the delimiters to contain the other mentioned whitespace characters. We'd rather wait for someone to have a concrete use case before deciding to support that, as allowing chars like
\r
raises serious security questions as described by the spec.