This takes over where #38 left off (huge thanks @kamyuentse), and gets it working on 1.27.0, while keeping our minimum compiler version (1.10.0!). It might seem a little weird, so I'll explain how it's doing both runtime and compiletime detection for maximum performance.
Runtime Detection
The stable feature in Rust 1.27 includes is_x86_feature_enabled!, which allows checking if a certain target feature is enabled. Internally, it uses the unstable cfg(target_feature), but can also query the CPU at runtime. As of 1.27, the runtime check isn't inlined, which means that adding SIMD support was actually slower than with it disabled.
A patch to the stdsimd crate has already landed to include checks, but in the mean time, httparse uses its own inlined cache. After querying the macros once, the feature set is stored in a local atomic, and checking it results in an overall speed improvement!
However, by using this cache, it actually interferes slightly with optimizations the compiler could do if compiled with target_cpu=native. That's because the macro internally uses cfg(target_feature), and when that is set, the entire branch can be eliminated.
Compile-time detection
So, we already have a win with runtime detection. This also includes support to use compile time detection, even though it isn't stable in Rust 1.27! It takes advantage of the fact that cargo includes a CARGO_CFG_TARGET_FEATURE environment variable exposed to build scripts.
So, the new build script also looks for that environment variable, and if it detects that someone is compiling with certains features we can use (either sse4.2 or avx2), that information is emitted in custom httparse cfg options.
Then, the compilation of httparse will use a version that doesn't use our cached feature detection, and just uses is_x86_feature_enabled! directly. Since we saw before that the feature has been enabled, this will in most cases mean the branch is eliminated entirely.
Both runtime and compile-time detection in httparse can be disabled, though it is currently meant for testing (to be able to run the tests with all the various parsing methods in CI).
Benchmark improvements
Pre-1.27 (or when specifically configured SIMD off)
Without SIMD, httparse is oh-so-slightly faster than Pico when the requests are tiny, but a bit slower on a more realist request from a browser.
With runtime detection (and SSE4.2 on the CPU), httparse loses a couple nanoseconds on small requests (it adds a branch that wasn't there before), but sees ~15% improvement on the bigger more common requests.
With -C target_cpu=native (and SSE4.2 on the CPU) httparse no longer loses time on smaller requests, since the branch is eliminated at compile time, and is another ~11% faster on normal requests than with runtime detection (or a total of ~24% improvement)!
This takes over where #38 left off (huge thanks @kamyuentse), and gets it working on 1.27.0, while keeping our minimum compiler version (1.10.0!). It might seem a little weird, so I'll explain how it's doing both runtime and compiletime detection for maximum performance.
Runtime Detection
The stable feature in Rust 1.27 includes
is_x86_feature_enabled!
, which allows checking if a certain target feature is enabled. Internally, it uses the unstablecfg(target_feature)
, but can also query the CPU at runtime. As of 1.27, the runtime check isn't inlined, which means that adding SIMD support was actually slower than with it disabled.A patch to the stdsimd crate has already landed to include checks, but in the mean time, httparse uses its own inlined cache. After querying the macros once, the feature set is stored in a local atomic, and checking it results in an overall speed improvement!
However, by using this cache, it actually interferes slightly with optimizations the compiler could do if compiled with
target_cpu=native
. That's because the macro internally usescfg(target_feature)
, and when that is set, the entire branch can be eliminated.Compile-time detection
So, we already have a win with runtime detection. This also includes support to use compile time detection, even though it isn't stable in Rust 1.27! It takes advantage of the fact that cargo includes a
CARGO_CFG_TARGET_FEATURE
environment variable exposed to build scripts.So, the new build script also looks for that environment variable, and if it detects that someone is compiling with certains features we can use (either sse4.2 or avx2), that information is emitted in custom httparse cfg options.
Then, the compilation of httparse will use a version that doesn't use our cached feature detection, and just uses
is_x86_feature_enabled!
directly. Since we saw before that the feature has been enabled, this will in most cases mean the branch is eliminated entirely.Both runtime and compile-time detection in httparse can be disabled, though it is currently meant for testing (to be able to run the tests with all the various parsing methods in CI).
Benchmark improvements
Pre-1.27 (or when specifically configured SIMD off)
1.27 with runtime detection (and my CPU has SSE4.2):
1.27 when setting
-C target_cpu=native
(and my CPU has SSE4.2):Takeaways
-C target_cpu=native
(and SSE4.2 on the CPU) httparse no longer loses time on smaller requests, since the branch is eliminated at compile time, and is another ~11% faster on normal requests than with runtime detection (or a total of ~24% improvement)!