seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
569 stars 110 forks source link

Add flags to allow multiple spaces in request and status lines #110

Closed acfoltzer closed 2 years ago

acfoltzer commented 2 years ago

Depends on #111.

These new flags set whether multiple spaces are allowed as delimiters in request lines and response status lines.

The latest version of the HTTP/1.1 spec (request lines, response status lines) allows implementations to parse multiple whitespace characters in place of the SP delimiter in the response status line, including:

SP, HTAB, VT (%x0B), FF (%x0C), or bare CR

This option relaxes the parsers to allow for multiple spaces, but does not allow the delimiters to contain the other mentioned whitespace characters. We'd rather wait for someone to have a concrete use case before deciding to support that, as allowing chars like \r raises serious security questions as described by the spec.

acfoltzer commented 2 years ago

Marking this as a draft temporarily, it turns out I was premature and we'll need this for request lines too :disappointed:

acfoltzer commented 2 years ago

This is ready for review, but will depend on #111 to actually pass tests etc. It doesn't look like I can change the base branch of the PR to point to my fork, but once #111 is merged it should update automatically.

acfoltzer commented 2 years ago

Sigh, the fuzzer found a bug with this PR after combining with #111. Back to draft for now.

acfoltzer commented 2 years ago

I have rewritten this to eagerly check the flag rather than backtracking after an error, which fixes the bugs mentioned above and obviates the issue where the reason phrase would include leading spaces from the delimiter. Happily this seems to be a performance improvement rather than a regression even when the new flags are disabled (results vs 6f6ff10):

req/req                 time:   [292.13 ns 295.81 ns 301.23 ns]
                        thrpt:  [2.2261 GiB/s 2.2668 GiB/s 2.2954 GiB/s]
                 change:
                        time:   [-13.390% -12.468% -11.159%] (p = 0.00 < 0.05)
                        thrpt:  [+12.561% +14.244% +15.460%]
                        Performance has improved.

req_short/req_short     time:   [62.834 ns 62.992 ns 63.188 ns]
                        thrpt:  [1.0023 GiB/s 1.0054 GiB/s 1.0079 GiB/s]
                 change:
                        time:   [-11.112% -9.9009% -8.9416%] (p = 0.00 < 0.05)
                        thrpt:  [+9.8196% +10.989% +12.501%]
                        Performance has improved.

resp/resp               time:   [303.51 ns 304.23 ns 305.34 ns]
                        thrpt:  [2.1320 GiB/s 2.1398 GiB/s 2.1449 GiB/s]
                 change:
                        time:   [-12.059% -11.512% -11.016%] (p = 0.00 < 0.05)
                        thrpt:  [+12.379% +13.010% +13.713%]
                        Performance has improved.

resp_short/resp_short   time:   [67.521 ns 67.686 ns 67.929 ns]
                        thrpt:  [1.2476 GiB/s 1.2521 GiB/s 1.2552 GiB/s]
                 change:
                        time:   [-9.1562% -8.7366% -8.3331%] (p = 0.00 < 0.05)
                        thrpt:  [+9.0906% +9.5730% +10.079%]
                        Performance has improved.

I haven't thrown it into godbolt or anything yet, but I suspect something about this change triggers a different inlining behavior compared to the baseline.

seanmonstar commented 2 years ago

Is there any other parsing changes you expect, or shall we prep a crate release?

acfoltzer commented 2 years ago

@seanmonstar we should be good to prep a release, thanks! I've been fuzzing this against our internal fork of picohttpparser and am not finding any other places for improvement, although I am excluding path parsing from the fuzzing for now. Paths are more difficult because both httparse and pico lean on other libraries to meaningfully interpret them, so future changes that come out of that work are more likely to land in http anyway.

nox commented 2 years ago

This broke some stuff: prior to this PR, there was no way to parse a request with a specific parser config, that allowed me to implement allow_spaces_after_header_name_in_responses and allow_obsolete_multiline_headers_in_responses only for responses without explicit checks.

Now that there is a ParserConfig::parse_request method, those flags are applied to both requests and responses.