Closed AaronO closed 1 year ago
Not completely in love with the name swar
, open to suggestions, also unclear if SIMD disabled should also disable swar
(doesn't use any special instructions) and only use scalar validators (very slow). No technical blockers more a matter of aligning intent.
Decent -15% improvements on full req parsing:
master:
test req/req ... bench: 257 ns/iter (+/- 6)
test req_short/req_short ... bench: 72 ns/iter (+/- 3)
test resp/resp ... bench: 273 ns/iter (+/- 61)
test resp_short/resp_short ... bench: 68 ns/iter (+/- 5)
swar:
test req/req ... bench: 225 ns/iter (+/- 2)
test req_short/req_short ... bench: 60 ns/iter (+/- 3)
test resp/resp ... bench: 239 ns/iter (+/- 24)
test resp_short/resp_short ... bench: 63 ns/iter (+/- 1)
You'll header/count
is great improve with swar
master:
test req/req ... bench: 117 ns/iter (+/- 13)
test req_short/req_short ... bench: 27 ns/iter (+/- 14)
test resp/resp ... bench: 120 ns/iter (+/- 2)
test resp_short/resp_short ... bench: 28 ns/iter (+/- 5)
test uri/uri_1b ... bench: 1 ns/iter (+/- 0)
test uri/uri_2b ... bench: 1 ns/iter (+/- 0)
test uri/uri_4b ... bench: 3 ns/iter (+/- 1)
test uri/uri_8b ... bench: 2 ns/iter (+/- 0)
test uri/uri_16b ... bench: 3 ns/iter (+/- 0)
test uri/uri_32b ... bench: 4 ns/iter (+/- 0)
test uri/uri_64b ... bench: 6 ns/iter (+/- 0)
test uri/uri_128b ... bench: 12 ns/iter (+/- 1)
test uri/uri_256b ... bench: 24 ns/iter (+/- 6)
test uri/uri_512b ... bench: 53 ns/iter (+/- 3)
test uri/uri_1024b ... bench: 95 ns/iter (+/- 1)
test uri/uri_2048b ... bench: 185 ns/iter (+/- 8)
test uri/uri_4096b ... bench: 366 ns/iter (+/- 24)
test header/name_1b ... bench: 9 ns/iter (+/- 0)
test header/name_2b ... bench: 9 ns/iter (+/- 0)
test header/name_4b ... bench: 9 ns/iter (+/- 0)
test header/name_8b ... bench: 9 ns/iter (+/- 0)
test header/name_16b ... bench: 10 ns/iter (+/- 0)
test header/name_32b ... bench: 13 ns/iter (+/- 0)
test header/name_64b ... bench: 20 ns/iter (+/- 0)
test header/name_128b ... bench: 32 ns/iter (+/- 0)
test header/name_256b ... bench: 58 ns/iter (+/- 1)
test header/name_512b ... bench: 109 ns/iter (+/- 2)
test header/name_1024b ... bench: 220 ns/iter (+/- 5)
test header/name_2048b ... bench: 424 ns/iter (+/- 8)
test header/name_4096b ... bench: 832 ns/iter (+/- 13)
test header/value_1b ... bench: 9 ns/iter (+/- 0)
test header/value_2b ... bench: 11 ns/iter (+/- 0)
test header/value_4b ... bench: 13 ns/iter (+/- 0)
test header/value_8b ... bench: 9 ns/iter (+/- 0)
test header/value_16b ... bench: 10 ns/iter (+/- 0)
test header/value_32b ... bench: 11 ns/iter (+/- 0)
test header/value_64b ... bench: 13 ns/iter (+/- 0)
test header/value_128b ... bench: 17 ns/iter (+/- 0)
test header/value_256b ... bench: 26 ns/iter (+/- 0)
test header/value_512b ... bench: 44 ns/iter (+/- 1)
test header/value_1024b ... bench: 89 ns/iter (+/- 2)
test header/value_2048b ... bench: 158 ns/iter (+/- 2)
test header/value_4096b ... bench: 301 ns/iter (+/- 8)
test header/count_1 ... bench: 9 ns/iter (+/- 0)
test header/count_2 ... bench: 16 ns/iter (+/- 0)
test header/count_4 ... bench: 29 ns/iter (+/- 0)
test header/count_8 ... bench: 53 ns/iter (+/- 1)
test header/count_16 ... bench: 112 ns/iter (+/- 449)
test header/count_32 ... bench: 203 ns/iter (+/- 4)
test header/count_64 ... bench: 396 ns/iter (+/- 7)
test header/count_128 ... bench: 788 ns/iter (+/- 23)
test version/http10 ... bench: 0 ns/iter (+/- 0)
test version/http11 ... bench: 0 ns/iter (+/- 0)
test version/partial ... bench: 1 ns/iter (+/- 0)
test method/get ... bench: 0 ns/iter (+/- 0)
test method/head ... bench: 2 ns/iter (+/- 0)
test method/post ... bench: 0 ns/iter (+/- 0)
test method/put ... bench: 2 ns/iter (+/- 0)
test method/delete ... bench: 3 ns/iter (+/- 0)
test method/connect ... bench: 3 ns/iter (+/- 0)
test method/options ... bench: 3 ns/iter (+/- 0)
test method/trace ... bench: 2 ns/iter (+/- 0)
test method/patch ... bench: 2 ns/iter (+/- 0)
test method/custom ... bench: 3 ns/iter (+/- 0)
swar:
test req/req ... bench: 114 ns/iter (+/- 0)
test req_short/req_short ... bench: 25 ns/iter (+/- 0)
test resp/resp ... bench: 119 ns/iter (+/- 2)
test resp_short/resp_short ... bench: 27 ns/iter (+/- 0)
test uri/uri_1b ... bench: 2 ns/iter (+/- 0)
test uri/uri_2b ... bench: 2 ns/iter (+/- 0)
test uri/uri_4b ... bench: 4 ns/iter (+/- 0)
test uri/uri_8b ... bench: 1 ns/iter (+/- 0)
test uri/uri_16b ... bench: 2 ns/iter (+/- 0)
test uri/uri_32b ... bench: 5 ns/iter (+/- 0)
test uri/uri_64b ... bench: 7 ns/iter (+/- 0)
test uri/uri_128b ... bench: 13 ns/iter (+/- 0)
test uri/uri_256b ... bench: 23 ns/iter (+/- 0)
test uri/uri_512b ... bench: 47 ns/iter (+/- 0)
test uri/uri_1024b ... bench: 97 ns/iter (+/- 2)
test uri/uri_2048b ... bench: 188 ns/iter (+/- 3)
test uri/uri_4096b ... bench: 369 ns/iter (+/- 2)
test header/name_1b ... bench: 9 ns/iter (+/- 0)
test header/name_2b ... bench: 8 ns/iter (+/- 0)
test header/name_4b ... bench: 9 ns/iter (+/- 0)
test header/name_8b ... bench: 9 ns/iter (+/- 0)
test header/name_16b ... bench: 12 ns/iter (+/- 0)
test header/name_32b ... bench: 14 ns/iter (+/- 0)
test header/name_64b ... bench: 20 ns/iter (+/- 1)
test header/name_128b ... bench: 34 ns/iter (+/- 1)
test header/name_256b ... bench: 58 ns/iter (+/- 3)
test header/name_512b ... bench: 109 ns/iter (+/- 3)
test header/name_1024b ... bench: 220 ns/iter (+/- 1)
test header/name_2048b ... bench: 424 ns/iter (+/- 9)
test header/name_4096b ... bench: 831 ns/iter (+/- 19)
test header/value_1b ... bench: 8 ns/iter (+/- 0)
test header/value_2b ... bench: 10 ns/iter (+/- 0)
test header/value_4b ... bench: 10 ns/iter (+/- 2)
test header/value_8b ... bench: 10 ns/iter (+/- 0)
test header/value_16b ... bench: 10 ns/iter (+/- 0)
test header/value_32b ... bench: 11 ns/iter (+/- 0)
test header/value_64b ... bench: 13 ns/iter (+/- 0)
test header/value_128b ... bench: 18 ns/iter (+/- 0)
test header/value_256b ... bench: 26 ns/iter (+/- 0)
test header/value_512b ... bench: 44 ns/iter (+/- 5)
test header/value_1024b ... bench: 89 ns/iter (+/- 12)
test header/value_2048b ... bench: 170 ns/iter (+/- 6)
test header/value_4096b ... bench: 302 ns/iter (+/- 11)
test header/count_1 ... bench: 9 ns/iter (+/- 0)
test header/count_2 ... bench: 15 ns/iter (+/- 0)
test header/count_4 ... bench: 26 ns/iter (+/- 1)
test header/count_8 ... bench: 51 ns/iter (+/- 1)
test header/count_16 ... bench: 96 ns/iter (+/- 5)
test header/count_32 ... bench: 196 ns/iter (+/- 7)
test header/count_64 ... bench: 336 ns/iter (+/- 12)
test header/count_128 ... bench: 662 ns/iter (+/- 6)
test version/http10 ... bench: 0 ns/iter (+/- 0)
test version/http11 ... bench: 0 ns/iter (+/- 0)
test version/partial ... bench: 1 ns/iter (+/- 0)
test method/get ... bench: 0 ns/iter (+/- 0)
test method/head ... bench: 2 ns/iter (+/- 0)
test method/post ... bench: 0 ns/iter (+/- 0)
test method/put ... bench: 2 ns/iter (+/- 0)
test method/delete ... bench: 3 ns/iter (+/- 0)
test method/connect ... bench: 3 ns/iter (+/- 0)
test method/options ... bench: 3 ns/iter (+/- 0)
test method/trace ... bench: 2 ns/iter (+/- 0)
test method/patch ... bench: 2 ns/iter (+/- 0)
test method/custom ... bench: 3 ns/iter (+/- 0)
swar + neon:
test req/req ... bench: 125 ns/iter (+/- 4)
test req_short/req_short ... bench: 27 ns/iter (+/- 0)
test resp/resp ... bench: 135 ns/iter (+/- 2)
test resp_short/resp_short ... bench: 27 ns/iter (+/- 0)
test uri/uri_1b ... bench: 2 ns/iter (+/- 0)
test uri/uri_2b ... bench: 3 ns/iter (+/- 0)
test uri/uri_4b ... bench: 4 ns/iter (+/- 0)
test uri/uri_8b ... bench: 2 ns/iter (+/- 0)
test uri/uri_16b ... bench: 2 ns/iter (+/- 0)
test uri/uri_32b ... bench: 3 ns/iter (+/- 0)
test uri/uri_64b ... bench: 5 ns/iter (+/- 0)
test uri/uri_128b ... bench: 7 ns/iter (+/- 0)
test uri/uri_256b ... bench: 14 ns/iter (+/- 0)
test uri/uri_512b ... bench: 26 ns/iter (+/- 0)
test uri/uri_1024b ... bench: 52 ns/iter (+/- 0)
test uri/uri_2048b ... bench: 108 ns/iter (+/- 0)
test uri/uri_4096b ... bench: 210 ns/iter (+/- 2)
test header/name_1b ... bench: 9 ns/iter (+/- 0)
test header/name_2b ... bench: 9 ns/iter (+/- 0)
test header/name_4b ... bench: 9 ns/iter (+/- 0)
test header/name_8b ... bench: 9 ns/iter (+/- 0)
test header/name_16b ... bench: 11 ns/iter (+/- 0)
test header/name_32b ... bench: 12 ns/iter (+/- 0)
test header/name_64b ... bench: 14 ns/iter (+/- 0)
test header/name_128b ... bench: 17 ns/iter (+/- 0)
test header/name_256b ... bench: 24 ns/iter (+/- 0)
test header/name_512b ... bench: 37 ns/iter (+/- 0)
test header/name_1024b ... bench: 64 ns/iter (+/- 0)
test header/name_2048b ... bench: 129 ns/iter (+/- 1)
test header/name_4096b ... bench: 237 ns/iter (+/- 3)
test header/value_1b ... bench: 9 ns/iter (+/- 0)
test header/value_2b ... bench: 9 ns/iter (+/- 0)
test header/value_4b ... bench: 10 ns/iter (+/- 0)
test header/value_8b ... bench: 10 ns/iter (+/- 0)
test header/value_16b ... bench: 12 ns/iter (+/- 0)
test header/value_32b ... bench: 13 ns/iter (+/- 0)
test header/value_64b ... bench: 14 ns/iter (+/- 0)
test header/value_128b ... bench: 18 ns/iter (+/- 0)
test header/value_256b ... bench: 24 ns/iter (+/- 0)
test header/value_512b ... bench: 37 ns/iter (+/- 0)
test header/value_1024b ... bench: 63 ns/iter (+/- 2)
test header/value_2048b ... bench: 125 ns/iter (+/- 5)
test header/value_4096b ... bench: 226 ns/iter (+/- 1)
test header/count_1 ... bench: 9 ns/iter (+/- 0)
test header/count_2 ... bench: 14 ns/iter (+/- 0)
test header/count_4 ... bench: 30 ns/iter (+/- 1)
test header/count_8 ... bench: 57 ns/iter (+/- 3)
test header/count_16 ... bench: 115 ns/iter (+/- 2)
test header/count_32 ... bench: 219 ns/iter (+/- 1)
test header/count_64 ... bench: 427 ns/iter (+/- 3)
test header/count_128 ... bench: 888 ns/iter (+/- 27)
test version/http10 ... bench: 0 ns/iter (+/- 0)
test version/http11 ... bench: 0 ns/iter (+/- 0)
test version/partial ... bench: 1 ns/iter (+/- 0)
test method/get ... bench: 0 ns/iter (+/- 0)
test method/head ... bench: 2 ns/iter (+/- 0)
test method/post ... bench: 0 ns/iter (+/- 0)
test method/put ... bench: 2 ns/iter (+/- 0)
test method/delete ... bench: 3 ns/iter (+/- 0)
test method/connect ... bench: 3 ns/iter (+/- 0)
test method/options ... bench: 3 ns/iter (+/- 0)
test method/trace ... bench: 2 ns/iter (+/- 0)
test method/patch ... bench: 2 ns/iter (+/- 0)
test method/custom ... bench: 3 ns/iter (+/- 0)
unclear if SIMD disabled should also disable swar (doesn't use any special instructions) and only use scalar validators
After looking at what this is, I don't think it needs to be disabled with SIMD disabled. The point of that config was mostly so that we could test the scalar code even if build detection wanted to enable SIMD. Otherwise it'd be too hard to test it.
This refactor moves the block-wise validators to a "swar" SIMD backend
The core logic of validate => extract => chain is (IMO) now more evident (and how we stack validators, using the largest SIMD validator available then finishing with scalar if at end of buffer [uncommon])
Perf wise, this is roughly on par with master, slightly faster on some regards but not 100% fine-tuned or fully benched with neon or avx/sse. In part because it avoids duplicating validating work between SIMD and scalar validators.