ogxd / gxhash

The fastest hashing algorithm 📈
https://docs.rs/gxhash
MIT License
767 stars 26 forks source link

Hybrid state gxhash #34

Closed ogxd closed 9 months ago

ogxd commented 10 months ago

Context

Currently, gxhash with a 128-bit state is much faster for small input sizes, while gxhash with a 256-bit state is faster for large input sizes. The idea is to study the possibilities of making an hybrid state version of gxhash, leveraring the advantages of 128-bit state with the advantages of the 256-bit state processing, for maximum throughput for all input sizes.

See the break-even point:

image

Challenges

There are two ways to achieve this:

Some challenges are:

Todo

ogxd commented 10 months ago

Results for first try at an hybrid approach x86_64

This is very encouraging. This is all done without using generics, which ended up making the code too complex for no good reason IMHO.
Note that this all remains stable with the full 128 bit version, which is AWESOME.

There are a few remaing things to tackle:

ogxd commented 10 months ago
Some bench results on my Ryzen 5 PC: no avx2 with avx2
main branch image image
hybrid-2 branch image image

In 4th picture we use both 128-bit SIMD for small inputs and 256-bit SIMD for larger inputs. It has the benefit of being stable with the 128-bit only version |

xxaier commented 10 months ago

s128 s256 hybird and the other hash maybe can draw it in the same picture so it looks clearer

ogxd commented 10 months ago

I can't because they're all generated from different compilation targets but I have used a max y of 130000 for all so scalewise it is comparable.
As you can see with the hybrid+stable approach there is a small loss in throughput starting with 32-bytes long inputs compared to the full AVX2 algorithm. Maybe a small compromise is acceptable if stability is guaranteed but still I'd like to continue fiddling with the implementation to see if a little more throughput can be squeezed out of it

3tieto commented 10 months ago

Share an article for reference only

https://deepmind.google/discover/blog/alphadev-discovers-faster-sorting-algorithms/

We applied AlphaDev to one of the most commonly used algorithms for hashing in data structures to try and discover a faster algorithm. And when we applied it to the 9-16 bytes range of the hashing function, the algorithm that AlphaDev discovered was 30% faster.

This year, AlphaDev’s new hashing algorithm was released into the open-source Abseil library, available to millions of developers around the world, and we estimate that it’s now being used trillions of times a day.

ogxd commented 10 months ago

Interesting article but I doubt AI (at least AI as we know it currently) would help making gxhash more performant

ogxd commented 10 months ago

Here is what it looks like after a few more optimizations (still passing SMHasher)

image

gxhash-2 is the current version on main, gxhash-3 is what is proposed in this PR

Takeaways:

3tieto commented 10 months ago

Are all these hashes stable ?

ogxd commented 10 months ago

Yes all hashes generated in v3 will be the same, independently from actual intrinsics used.
I did one more batch of optimizations yesterday but I am reaching the limits (or at least my limit 😅). I am going to proceed to a cleanup and review this as a whole

ogxd commented 10 months ago

I did (again 😅) one more batch of optimizations

Before

image

After

image

Merging this by the end of the week.

3tieto commented 10 months ago

Currently, the rust standard library uses hashbrown, and hashbrown uses ahash. I think you can try submitting a pull request to hashbrown first and replace ahash with gxhash.

https://github.com/rust-lang/hashbrown

Since Rust 1.36, this is now the HashMap implementation for the Rust standard library. However you may still want to use this crate instead since it works in environments without std, such as embedded systems and kernels.

Uses AHash as the default hasher, which is much faster than SipHash. However, AHash does not provide the same level of HashDoS resistance as SipHash, so if that is important to you, you might want to consider using a different hasher.

image

ogxd commented 10 months ago

Yes I know I've already done that, and it was earlier today to be precise ;)

This has led to this first issue https://github.com/ogxd/gxhash/issues/40