ogxd / gxhash

The fastest hashing algorithm 📈

https://docs.rs/gxhash

MIT License

696 stars 23 forks source link

Primitives specialization for Hasher #40

Closed ogxd closed 7 months ago

ogxd commented 7 months ago

Context

The Hasher trait exposes methods to hash primitives. Currently, we hash primitives by considering them all as slices of bytes. Hashing can be much performance if the type is known in advance (eg load primitive directly in SIMD vector).

Triggered by the following discussion: https://github.com/rust-lang/hashbrown/issues/487

Goals

Hashing a primitive type is faster.
Hashing a primitive still follows the algorithm principles (and thus remains stable and passes SMHasher)

Todo

[x] Add benchmark to test hashing on primitive, like u32
[x] Implement all methods (write_u32, write_u64, ...)
- [x] Make it work for ARM
- [x] Make it work for x86
[x] Publish benchmark results before/after on ARM/X86]

ogxd commented 7 months ago

It seems the default write (non-specialized) wasn't that bad even on the smallest primitive types. On MacBook M1 pro, using write_u32 yields a +13% performance (which is still substantial).

Another interesting thing is that the hashset benchmark was biased in some cases. black_boxing the keys prevents compiler optimizations that made this bench biased.

ogxd commented 7 months ago

Current progress involves hashes that are stable in the context of the Hasher, however hashes for an u32 hashed via Hasher::write_u32 are not stable with hashes using the gxhash(&[u8], ...) method. I think this is acceptable because those are two very different contexts. SMHasher should still pass for both contexes.

ogxd commented 7 months ago

Fixed a SIGSEGV when passed [u8] is a null slice (not just an empty slice)

ogxd commented 7 months ago

Merging and releasing 2.3.0

On both my ARM and X86 platforms, I get about -13% of hashing time for small inputs (u8, u16, u32, u64, u128 and signed counterparts). On my ARM PC, gxhash Hasher is now faster than ahashfor such inputs. My on X86 PC, gxhashremain a bit slower for these inputs (about 10% slower). I have a doubt in ahash Hasher passing SMHasher quality test for such inputs.