seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
567 stars 111 forks source link

simd: split cursor advancing from value matching #156

Closed lucab closed 4 months ago

lucab commented 4 months ago

This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.

Performance impact on my Intel AVX2-capable workstation seems positive (arbitrary benchmark-noise filtering at >20%):

critcmp

lucab commented 4 months ago

@seanmonstar this is ready for a review pass, whenever you have time.

There is a minor cleanup bundled in this PR (marking several functions as pub(crate)) which I did in order to make sure I wasn't changing public APIs. I can split that to a dedicated PR if you prefer.

I'll be honest, I started doing this rework as part of https://github.com/hyperium/hyper/issues/3574 before actually going for https://github.com/hyperium/hyper/pull/3575, focused on memory usage/allocation patterns. My goal was to (eventually) use the bytes crate in this library, but then I realized that this required SIMD-related groundwork was actually providing performance improvements on its own. As such, I think it makes sense to land this already.

lucab commented 4 months ago

Thanks for merging this. Even if I recorded those perf numbers myself, I'm still somehow puzzled and a bit skeptical about them. I tried on a different machine (an AMD Ryzen 7) and this time I'm not seeing any measurable change (i.e. all differences are below 20%). Both cases are workstation laptops and I did disable all kind of turbo-boosting and CPU dynamic scaling I could think of. So either the code change is reliably hitting something very specific on the Intel machine (better cache locality?), or this workstation is just a-not-very good environment for benchmark comparisons.

Overall, I think the new code is a useful refactor but I personally won't guarantee the pictured performance changes to be valid in all environments.