seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
584 stars 114 forks source link

refactor: simd swar #134

Closed AaronO closed 1 year ago

AaronO commented 1 year ago

This refactor moves the block-wise validators to a "swar" SIMD backend

The core logic of validate => extract => chain is (IMO) now more evident (and how we stack validators, using the largest SIMD validator available then finishing with scalar if at end of buffer [uncommon])

Perf wise, this is roughly on par with master, slightly faster on some regards but not 100% fine-tuned or fully benched with neon or avx/sse. In part because it avoids duplicating validating work between SIMD and scalar validators.

AaronO commented 1 year ago

Not completely in love with the name swar, open to suggestions, also unclear if SIMD disabled should also disable swar (doesn't use any special instructions) and only use scalar validators (very slow). No technical blockers more a matter of aligning intent.

AaronO commented 1 year ago

Bench dump

x64

Decent -15% improvements on full req parsing:

master:
test req/req ... bench:                         257 ns/iter (+/- 6)
test req_short/req_short ... bench:             72 ns/iter (+/- 3)
test resp/resp ... bench:                       273 ns/iter (+/- 61)
test resp_short/resp_short ... bench:           68 ns/iter (+/- 5)

swar:
test req/req ... bench:                         225 ns/iter (+/- 2)
test req_short/req_short ... bench:             60 ns/iter (+/- 3)
test resp/resp ... bench:                       239 ns/iter (+/- 24)
test resp_short/resp_short ... bench:           63 ns/iter (+/- 1)

M1 Air

You'll header/count is great improve with swar

master:
test req/req ... bench:         117 ns/iter (+/- 13)
test req_short/req_short ... bench:          27 ns/iter (+/- 14)
test resp/resp ... bench:         120 ns/iter (+/- 2)
test resp_short/resp_short ... bench:          28 ns/iter (+/- 5)
test uri/uri_1b ... bench:           1 ns/iter (+/- 0)
test uri/uri_2b ... bench:           1 ns/iter (+/- 0)
test uri/uri_4b ... bench:           3 ns/iter (+/- 1)
test uri/uri_8b ... bench:           2 ns/iter (+/- 0)
test uri/uri_16b ... bench:           3 ns/iter (+/- 0)
test uri/uri_32b ... bench:           4 ns/iter (+/- 0)
test uri/uri_64b ... bench:           6 ns/iter (+/- 0)
test uri/uri_128b ... bench:          12 ns/iter (+/- 1)
test uri/uri_256b ... bench:          24 ns/iter (+/- 6)
test uri/uri_512b ... bench:          53 ns/iter (+/- 3)
test uri/uri_1024b ... bench:          95 ns/iter (+/- 1)
test uri/uri_2048b ... bench:         185 ns/iter (+/- 8)
test uri/uri_4096b ... bench:         366 ns/iter (+/- 24)
test header/name_1b ... bench:           9 ns/iter (+/- 0)
test header/name_2b ... bench:           9 ns/iter (+/- 0)
test header/name_4b ... bench:           9 ns/iter (+/- 0)
test header/name_8b ... bench:           9 ns/iter (+/- 0)
test header/name_16b ... bench:          10 ns/iter (+/- 0)
test header/name_32b ... bench:          13 ns/iter (+/- 0)
test header/name_64b ... bench:          20 ns/iter (+/- 0)
test header/name_128b ... bench:          32 ns/iter (+/- 0)
test header/name_256b ... bench:          58 ns/iter (+/- 1)
test header/name_512b ... bench:         109 ns/iter (+/- 2)
test header/name_1024b ... bench:         220 ns/iter (+/- 5)
test header/name_2048b ... bench:         424 ns/iter (+/- 8)
test header/name_4096b ... bench:         832 ns/iter (+/- 13)
test header/value_1b ... bench:           9 ns/iter (+/- 0)
test header/value_2b ... bench:          11 ns/iter (+/- 0)
test header/value_4b ... bench:          13 ns/iter (+/- 0)
test header/value_8b ... bench:           9 ns/iter (+/- 0)
test header/value_16b ... bench:          10 ns/iter (+/- 0)
test header/value_32b ... bench:          11 ns/iter (+/- 0)
test header/value_64b ... bench:          13 ns/iter (+/- 0)
test header/value_128b ... bench:          17 ns/iter (+/- 0)
test header/value_256b ... bench:          26 ns/iter (+/- 0)
test header/value_512b ... bench:          44 ns/iter (+/- 1)
test header/value_1024b ... bench:          89 ns/iter (+/- 2)
test header/value_2048b ... bench:         158 ns/iter (+/- 2)
test header/value_4096b ... bench:         301 ns/iter (+/- 8)
test header/count_1 ... bench:           9 ns/iter (+/- 0)
test header/count_2 ... bench:          16 ns/iter (+/- 0)
test header/count_4 ... bench:          29 ns/iter (+/- 0)
test header/count_8 ... bench:          53 ns/iter (+/- 1)
test header/count_16 ... bench:         112 ns/iter (+/- 449)
test header/count_32 ... bench:         203 ns/iter (+/- 4)
test header/count_64 ... bench:         396 ns/iter (+/- 7)
test header/count_128 ... bench:         788 ns/iter (+/- 23)
test version/http10 ... bench:           0 ns/iter (+/- 0)
test version/http11 ... bench:           0 ns/iter (+/- 0)
test version/partial ... bench:           1 ns/iter (+/- 0)
test method/get ... bench:           0 ns/iter (+/- 0)
test method/head ... bench:           2 ns/iter (+/- 0)
test method/post ... bench:           0 ns/iter (+/- 0)
test method/put ... bench:           2 ns/iter (+/- 0)
test method/delete ... bench:           3 ns/iter (+/- 0)
test method/connect ... bench:           3 ns/iter (+/- 0)
test method/options ... bench:           3 ns/iter (+/- 0)
test method/trace ... bench:           2 ns/iter (+/- 0)
test method/patch ... bench:           2 ns/iter (+/- 0)
test method/custom ... bench:           3 ns/iter (+/- 0)

swar:
test req/req ... bench:         114 ns/iter (+/- 0)
test req_short/req_short ... bench:          25 ns/iter (+/- 0)
test resp/resp ... bench:         119 ns/iter (+/- 2)
test resp_short/resp_short ... bench:          27 ns/iter (+/- 0)
test uri/uri_1b ... bench:           2 ns/iter (+/- 0)
test uri/uri_2b ... bench:           2 ns/iter (+/- 0)
test uri/uri_4b ... bench:           4 ns/iter (+/- 0)
test uri/uri_8b ... bench:           1 ns/iter (+/- 0)
test uri/uri_16b ... bench:           2 ns/iter (+/- 0)
test uri/uri_32b ... bench:           5 ns/iter (+/- 0)
test uri/uri_64b ... bench:           7 ns/iter (+/- 0)
test uri/uri_128b ... bench:          13 ns/iter (+/- 0)
test uri/uri_256b ... bench:          23 ns/iter (+/- 0)
test uri/uri_512b ... bench:          47 ns/iter (+/- 0)
test uri/uri_1024b ... bench:          97 ns/iter (+/- 2)
test uri/uri_2048b ... bench:         188 ns/iter (+/- 3)
test uri/uri_4096b ... bench:         369 ns/iter (+/- 2)
test header/name_1b ... bench:           9 ns/iter (+/- 0)
test header/name_2b ... bench:           8 ns/iter (+/- 0)
test header/name_4b ... bench:           9 ns/iter (+/- 0)
test header/name_8b ... bench:           9 ns/iter (+/- 0)
test header/name_16b ... bench:          12 ns/iter (+/- 0)
test header/name_32b ... bench:          14 ns/iter (+/- 0)
test header/name_64b ... bench:          20 ns/iter (+/- 1)
test header/name_128b ... bench:          34 ns/iter (+/- 1)
test header/name_256b ... bench:          58 ns/iter (+/- 3)
test header/name_512b ... bench:         109 ns/iter (+/- 3)
test header/name_1024b ... bench:         220 ns/iter (+/- 1)
test header/name_2048b ... bench:         424 ns/iter (+/- 9)
test header/name_4096b ... bench:         831 ns/iter (+/- 19)
test header/value_1b ... bench:           8 ns/iter (+/- 0)
test header/value_2b ... bench:          10 ns/iter (+/- 0)
test header/value_4b ... bench:          10 ns/iter (+/- 2)
test header/value_8b ... bench:          10 ns/iter (+/- 0)
test header/value_16b ... bench:          10 ns/iter (+/- 0)
test header/value_32b ... bench:          11 ns/iter (+/- 0)
test header/value_64b ... bench:          13 ns/iter (+/- 0)
test header/value_128b ... bench:          18 ns/iter (+/- 0)
test header/value_256b ... bench:          26 ns/iter (+/- 0)
test header/value_512b ... bench:          44 ns/iter (+/- 5)
test header/value_1024b ... bench:          89 ns/iter (+/- 12)
test header/value_2048b ... bench:         170 ns/iter (+/- 6)
test header/value_4096b ... bench:         302 ns/iter (+/- 11)
test header/count_1 ... bench:           9 ns/iter (+/- 0)
test header/count_2 ... bench:          15 ns/iter (+/- 0)
test header/count_4 ... bench:          26 ns/iter (+/- 1)
test header/count_8 ... bench:          51 ns/iter (+/- 1)
test header/count_16 ... bench:          96 ns/iter (+/- 5)
test header/count_32 ... bench:         196 ns/iter (+/- 7)
test header/count_64 ... bench:         336 ns/iter (+/- 12)
test header/count_128 ... bench:         662 ns/iter (+/- 6)
test version/http10 ... bench:           0 ns/iter (+/- 0)
test version/http11 ... bench:           0 ns/iter (+/- 0)
test version/partial ... bench:           1 ns/iter (+/- 0)
test method/get ... bench:           0 ns/iter (+/- 0)
test method/head ... bench:           2 ns/iter (+/- 0)
test method/post ... bench:           0 ns/iter (+/- 0)
test method/put ... bench:           2 ns/iter (+/- 0)
test method/delete ... bench:           3 ns/iter (+/- 0)
test method/connect ... bench:           3 ns/iter (+/- 0)
test method/options ... bench:           3 ns/iter (+/- 0)
test method/trace ... bench:           2 ns/iter (+/- 0)
test method/patch ... bench:           2 ns/iter (+/- 0)
test method/custom ... bench:           3 ns/iter (+/- 0)

swar + neon:
test req/req ... bench:         125 ns/iter (+/- 4)
test req_short/req_short ... bench:          27 ns/iter (+/- 0)
test resp/resp ... bench:         135 ns/iter (+/- 2)
test resp_short/resp_short ... bench:          27 ns/iter (+/- 0)
test uri/uri_1b ... bench:           2 ns/iter (+/- 0)
test uri/uri_2b ... bench:           3 ns/iter (+/- 0)
test uri/uri_4b ... bench:           4 ns/iter (+/- 0)
test uri/uri_8b ... bench:           2 ns/iter (+/- 0)
test uri/uri_16b ... bench:           2 ns/iter (+/- 0)
test uri/uri_32b ... bench:           3 ns/iter (+/- 0)
test uri/uri_64b ... bench:           5 ns/iter (+/- 0)
test uri/uri_128b ... bench:           7 ns/iter (+/- 0)
test uri/uri_256b ... bench:          14 ns/iter (+/- 0)
test uri/uri_512b ... bench:          26 ns/iter (+/- 0)
test uri/uri_1024b ... bench:          52 ns/iter (+/- 0)
test uri/uri_2048b ... bench:         108 ns/iter (+/- 0)
test uri/uri_4096b ... bench:         210 ns/iter (+/- 2)
test header/name_1b ... bench:           9 ns/iter (+/- 0)
test header/name_2b ... bench:           9 ns/iter (+/- 0)
test header/name_4b ... bench:           9 ns/iter (+/- 0)
test header/name_8b ... bench:           9 ns/iter (+/- 0)
test header/name_16b ... bench:          11 ns/iter (+/- 0)
test header/name_32b ... bench:          12 ns/iter (+/- 0)
test header/name_64b ... bench:          14 ns/iter (+/- 0)
test header/name_128b ... bench:          17 ns/iter (+/- 0)
test header/name_256b ... bench:          24 ns/iter (+/- 0)
test header/name_512b ... bench:          37 ns/iter (+/- 0)
test header/name_1024b ... bench:          64 ns/iter (+/- 0)
test header/name_2048b ... bench:         129 ns/iter (+/- 1)
test header/name_4096b ... bench:         237 ns/iter (+/- 3)
test header/value_1b ... bench:           9 ns/iter (+/- 0)
test header/value_2b ... bench:           9 ns/iter (+/- 0)
test header/value_4b ... bench:          10 ns/iter (+/- 0)
test header/value_8b ... bench:          10 ns/iter (+/- 0)
test header/value_16b ... bench:          12 ns/iter (+/- 0)
test header/value_32b ... bench:          13 ns/iter (+/- 0)
test header/value_64b ... bench:          14 ns/iter (+/- 0)
test header/value_128b ... bench:          18 ns/iter (+/- 0)
test header/value_256b ... bench:          24 ns/iter (+/- 0)
test header/value_512b ... bench:          37 ns/iter (+/- 0)
test header/value_1024b ... bench:          63 ns/iter (+/- 2)
test header/value_2048b ... bench:         125 ns/iter (+/- 5)
test header/value_4096b ... bench:         226 ns/iter (+/- 1)
test header/count_1 ... bench:           9 ns/iter (+/- 0)
test header/count_2 ... bench:          14 ns/iter (+/- 0)
test header/count_4 ... bench:          30 ns/iter (+/- 1)
test header/count_8 ... bench:          57 ns/iter (+/- 3)
test header/count_16 ... bench:         115 ns/iter (+/- 2)
test header/count_32 ... bench:         219 ns/iter (+/- 1)
test header/count_64 ... bench:         427 ns/iter (+/- 3)
test header/count_128 ... bench:         888 ns/iter (+/- 27)
test version/http10 ... bench:           0 ns/iter (+/- 0)
test version/http11 ... bench:           0 ns/iter (+/- 0)
test version/partial ... bench:           1 ns/iter (+/- 0)
test method/get ... bench:           0 ns/iter (+/- 0)
test method/head ... bench:           2 ns/iter (+/- 0)
test method/post ... bench:           0 ns/iter (+/- 0)
test method/put ... bench:           2 ns/iter (+/- 0)
test method/delete ... bench:           3 ns/iter (+/- 0)
test method/connect ... bench:           3 ns/iter (+/- 0)
test method/options ... bench:           3 ns/iter (+/- 0)
test method/trace ... bench:           2 ns/iter (+/- 0)
test method/patch ... bench:           2 ns/iter (+/- 0)
test method/custom ... bench:           3 ns/iter (+/- 0)
seanmonstar commented 1 year ago

unclear if SIMD disabled should also disable swar (doesn't use any special instructions) and only use scalar validators

After looking at what this is, I don't think it needs to be disabled with SIMD disabled. The point of that config was mostly so that we could test the scalar code even if build detection wanted to enable SIMD. Otherwise it'd be too hard to test it.