How to use AVX2 only for hashmap/set, and use stable hash for hash128/64/32

xxaier commented 7 months ago

how to use AVX2 only for hashmap/set ，and use stable hash for hash128/64/32

ogxd commented 7 months ago

Hello @xxaier,

Can you share a little bit about your usecase? Why would you need both?

This is currently not possible, as using the 'avx2' makes the whole binary use AVX2. Allowing both would imply to add specific APIs like GxHasherAVX2, GxHashBuilderAVX2, GxHashMapAVX2 and gxhash32/64/128_avx2 on top of the existing APIs, making the package and documentation a lot more complex for something I feel really is an edge case.

If others are interested in this please raise you hand and share a little bit about your usecase.

ogxd commented 7 months ago

Maybe this would work, but I haven't tested it:

[dependencies]
gxhash = { package = "gxhash", version = "*" }
gxhash_avx2 = { package = "gxhash", version = "*", features = ["avx2"] }

xxaier commented 7 months ago

I call Google Translate's API to translate some text. I need to cache the translated text by sentence to avoid repeated calls. At this time, I need a stable hash.

I also want to use gxhash's HashMap and HashSet in daily development. For these two data structures, it is faster and better to use them, and they do not need to be stable.

I think since the output results are different, they should be two different sets of functions. The same function name can be used, but one is under the default module and the other is under the avx2 module. This way they can be enabled simultaneously on demand.

for example : gxhash::hash64 , gxhash::avx2::hash64 , Also, if avx2 is enabled, the default HashMap alias wiil uses the gxhash::avx2 module function.

ogxd commented 7 months ago


[dependencies]
gxhash = { package = "gxhash", version = "*" }
gxhash_avx2 = { package = "gxhash", version = "*", features = ["avx2"] }```

It seems this doesn't work [...] depends on crategxhash v2.2.4multiple times with different names

for example : gxhash::hash64 , gxhash::avx2::hash64

Indeed, it sounds more practical this way, but there are still a few issues:

Currently, AVX2 gxhash is not as performant as non-AVX2 version for small input sizes. That's because there is one additional operation when using AVX2 (for combining the two 128-bit lanes). If it were faster for all input sizes, I'd go for it as the default Hasher for HashMaps/HashSets, but currently, it's more of a compromise.
Explicitly calling gxhash::avx2::... makes your code less portable as you directly reference platform-specific code.
In order not to duplicate code we'd probably need to use include!() macro and some module hacks that would make the project more difficult to maintain

xxaier commented 7 months ago

I think there are solutions to these problems.

First, define 2 modules,

gxhash::avx2,

gxhash::stable.

Then, gxhash:: still chooses whether to use stable or avx2 based on whether avx2 features are enabled (this does not affect existing code logic and solves problem 2).

Then, define another module called fast. Fast hash will first check the data length - for small data it calls stable and for large data it calls avx2 (if avx2 is not enabled it just uses stable without switching). Use fast as the default Hasher for HashMap/HashSet (this solves problem 1).

For problem 3, it can be solved by defining an internal Trait. Put any reusable code into interfaces of this Trait, and call this Trait interface wherever needed.

ogxd commented 7 months ago

Then, define another module called fast. Fast hash will first check the data length - for small data it calls stable and for large data it calls avx2 (if avx2 is not enabled it just uses stable without switching). Use fast as the default Hasher for HashMap/HashSet (this solves problem 1).

That would complicate a lot the code. It's not just about adding a check on the data length. The check itself and having both paths inlined would reduce performance. It would have to be done maybe at this stage using avx2 in compress_many when possible. The issue is that when using avx2 the state is 256-bit instead of 128-bit, so it will read the remaining data by blocks of 256-bit. The problem is that you are not sure there is a whole number of 256-bit blocks to process because the remaining data is mod 128, not mod 256. So 50% of the time you need an additional 128-bit read to make the remaining data mod 256. It can be done, but again, it will complicate things. We can try what it would look like on a branch or on a fork still 🤷

xxaier commented 7 months ago

Adding validation checks will certainly degrade performance, but it is uncertain by how much. It may be worthwhile.

Therefore, I feel could prioritize completing enabling two modules at the same time.

If we can concurrently enable the stable and avx2 versions, it would be straightforward to write a few lines of code to benchmark them.

ogxd commented 7 months ago

The check must be done at compile time. It's not much a question of performance overhead on the check itself, but more on the cost of including more code into the bytecode. The code is highly optimized and every line counts. For instance, the way the code is arranged currently results in a small bytecode (everything is inlined) resulting in a lot of inlining opportunities of the whole algorithm itself in the places it's going to be used. Having the instructions for both 128-bit and 256-bit ways will significantly reduce inlining opportunities. If done at compile time, that's a different story.

Despite the undeniable increase in complexity, I have made a branch where I'll be experimenting for a hybrid implementation to maximum performances for all input sizes.

https://github.com/ogxd/gxhash/pull/33

For now, using a trait and generics for choosing the set of platform intrinsics seems to have no impact on performance (rust is awesome 😍)

xxaier commented 7 months ago

I am anticipating the bench report.

I attempted to run the cargo bench myself, but there was an error: https://github.com/ogxd/gxhash/pull/33#issuecomment-1831122670

ogxd commented 7 months ago

Yes it is still very WIP, I had to change the architecture of the project first to allow both 128-bit state and 256-bit state versions of the algorithm to coexist in a single binary. I haven't touch the algorithm itself for now, so even if the bench does not build the result would have been the same as before.

Note that this is completely experimental, I have created this branch only for experimenting ideas. It may lead to nothing, there are no guarantees. There is another completely orthogonal path which consists in making any state width stable altogether, and if it works then it will remove the need of having explicit ways to choose the state width. This is however still at the research stage I'd say. I'll keep you updated here if I have made notable advancements.

ogxd commented 7 months ago

Here I've made a dedicated issue https://github.com/ogxd/gxhash/issues/34.
Main goal is performance.

ogxd commented 6 months ago

Addressed with v3.0.0

ogxd / gxhash

How to use AVX2 only for hashmap/set, and use stable hash for hash128/64/32 #29