seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
569 stars 110 forks source link

Fix token and uri parsers to disallow empty results #111

Closed acfoltzer closed 2 years ago

acfoltzer commented 2 years ago

This fixes cases where the parser would accept non-compliant request lines including empty methods and paths.

For the token grammar, the spec is:

  token          = 1*tchar

1* is shorthand for one-or-more, so the empty string is not a valid token.

For the path component of the request line, the spec we're concerned with the absolute-path grammar:

  absolute-path = 1*( "/" segment )

While segment might be empty, there must be at least a "/" for it to be syntactically valid.

I've added tests for these cases and their combination, and had to update the expected error of one of the existing URI tests which now fails sooner due to the empty path.

acfoltzer commented 2 years ago

This appears to be performance neutral:

~/src/httparse on  acfoltzer/multiple-space-status-line-delimiters is 📦 v1.6.0 via 🦀 v1.59.0 took 13s
❯ cargo +nightly bench --bench parse
   Compiling httparse v1.6.0 (/home/acfoltzer/src/httparse)
    Finished bench [optimized] target(s) in 7.24s
     Running benches/parse.rs (target/release/deps/parse-80e05693670d5e82)

running 4 tests
test bench_httparse       ... bench:         240 ns/iter (+/- 18) = 3000 MB/s
test bench_httparse_short ... bench:          47 ns/iter (+/- 0) = 1446 MB/s
test bench_pico           ... bench:         342 ns/iter (+/- 5) = 2105 MB/s
test bench_pico_short     ... bench:          42 ns/iter (+/- 0) = 1619 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 16.41s

~/src/httparse on  master is 📦 v1.6.0 via 🦀 v1.59.0 took 23s
❯ cargo +nightly bench --bench parse
   Compiling httparse v1.6.0 (/home/acfoltzer/src/httparse)
    Finished bench [optimized] target(s) in 7.18s
     Running benches/parse.rs (target/release/deps/parse-80e05693670d5e82)

running 4 tests
test bench_httparse       ... bench:         239 ns/iter (+/- 10) = 3012 MB/s
test bench_httparse_short ... bench:          48 ns/iter (+/- 0) = 1416 MB/s
test bench_pico           ... bench:         326 ns/iter (+/- 3) = 2208 MB/s
test bench_pico_short     ... bench:          40 ns/iter (+/- 0) = 1700 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 13.29s

(tested with Ubuntu 20.04 on a 18C/36T Xeon)

acfoltzer commented 2 years ago

Criterion shows this to be within the noise after much more sampling, so I feel increasingly confident this is a performance-neutral fix

req/req                 time:   [331.62 ns 333.11 ns 334.56 ns]
                        thrpt:  [2.0043 GiB/s 2.0130 GiB/s 2.0221 GiB/s]
                 change:
                        time:   [-0.5625% +0.4717% +1.2605%] (p = 0.36 > 0.05)
                        thrpt:  [-1.2448% -0.4694% +0.5656%]
                        No change in performance detected.

req_short/req_short     time:   [68.329 ns 68.730 ns 69.152 ns]
                        thrpt:  [937.79 MiB/s 943.54 MiB/s 949.09 MiB/s]
                 change:
                        time:   [-1.6173% -1.1715% -0.7101%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7151% +1.1853% +1.6439%]
                        Change within noise threshold.

resp/resp               time:   [336.06 ns 337.24 ns 338.61 ns]
                        thrpt:  [1.9226 GiB/s 1.9304 GiB/s 1.9371 GiB/s]
                 change:
                        time:   [-0.7660% -0.1202% +0.5012%] (p = 0.72 > 0.05)
                        thrpt:  [-0.4987% +0.1204% +0.7719%]
                        No change in performance detected.

resp_short/resp_short   time:   [73.375 ns 73.788 ns 74.284 ns]
                        thrpt:  [1.1409 GiB/s 1.1486 GiB/s 1.1550 GiB/s]
                 change:
                        time:   [-1.7031% -1.0090% -0.4162%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4180% +1.0193% +1.7326%]
                        Change within noise threshold.