tokio-rs / prost

PROST! a Protocol Buffers implementation for the Rust Language
Apache License 2.0
3.94k stars 508 forks source link

Investigate potential x86 varint optimization #279

Open danburkert opened 4 years ago

danburkert commented 4 years ago

https://www.reddit.com/r/rust/comments/f36j05/comment/fhhwqp9

danburkert commented 4 years ago

see also https://github.com/gnzlbg/bitintr for safe and cross platform wrappers over the intrinsics

danburkert commented 3 years ago

https://news.ycombinator.com/item?id=25183811

danburkert commented 3 years ago

https://www.reddit.com/r/rust/comments/klck6a/i_published_my_first_crate_varintsimd/

as-com commented 3 years ago

So I did some quick and dirty prototyping with varint-simd v0.3.0, and here's what I found:

This is probably because the only encode/decode function is for single u64's, which is currently a weak point for varint-simd (it's not that much faster than other implementations when decoding/encoding tiny u64's).

I suspect there will need to be some larger-scale refactoring to take full advantage of varint-simd. For example, protobuf tags are up to 32 bits long, so a lot of cycles can be saved when encoding/decoding those. 

My library also just added support for quickly decoding two, four, and eight adjacent varints in parallel (subject to size limitations), with some really good throughput figures - most of the time, protobufs will be a 32 bit tag followed by a 32 bit number or length, and decode requests can be shrunk based on how large the data field is in the .proto file. So there's likely a lot more gains to be had.