Closed indutny closed 6 years ago
Should we do bit testing as http-parser does?
This needs to be benchmarked: jump-table generated from large switch vs packed bit test table.
Results:
/tmp/1 "/just/a/normal/url/path"
bitmap[1]: 8192.00 mb | 1424.19 mb/s | 5.75 s
switch[1]: 8192.00 mb | 407.44 mb/s | 20.11 s
(See 80763b4)
Disregard the result above, disabling inlining yields:
$ /tmp/1 "/just/a/normal/url/path"
bitmap: 8192.00 mb | 1139.52 mb/s | 7.19 s
switch: 8192.00 mb | 409.14 mb/s | 20.02 s
Larger input:
$ /tmp/1 "/rather/long/and/boring/url/path/with/lots/of/characters/in/it/to/remove/call/time/from/benchmark"
bitmap: 8192.00 mb | 1206.14 mb/s | 6.79 s
switch: 8192.00 mb | 409.92 mb/s | 19.98 s
Improved url parsing speed in llhttp
from 400 mb/s to 800 mb/s!
There're lots of cases where we're matching against the range of the values, e.g.
0x20 - 0xff
. LLVM does great job when there's nothing outside of this range, but when there're several ranges, or just few characters outside of the range - the code is optimized into a jump table, which leads to sub-optimal performance.Thus we should either split the
switch
into several sub-switches, contribute a patch to LLVM for more sophisticatedswitch
optimization, or generate more elaborate code ourselves.