ogxd / gxhash

The fastest hashing algorithm 📈
https://docs.rs/gxhash
MIT License
696 stars 23 forks source link

Add software AES fallback #54

Open ogxd opened 6 months ago

notsatvrn commented 6 months ago

I've been working on this and I got it mostly working (fails is_stable), however the performance is, as I feared, abysmal. I ran this test on an old Chromebook which doesn't have AES-NI and this is the fastest I could get it to run.

image

It would break stability, but maybe AES emulation isn't the right solution.

ogxd commented 6 months ago

Wow nice!

Performance is indeed much worse, but I was expecting even worse results tbh (like at least 2 orders of magnitude slower).
That said, it is still much slower than using another algorithm as a fallback. We are currently constrained by the hash "stability" feature advertised in this crate. We can decide to remove that constraint, but it would imply that using the algorithm shouldn't be used outside of the process lifetime scope. This is what they went for in ahash.

AHash does not have a fixed standard for its output. This allows it to improve over time. For example, if any faster algorithm is found, aHash will be updated to incorporate the technique. Similarly, should any flaw in aHash's DOS resistance be found, aHash will be changed to correct the flaw.

Because it does not have a fixed standard, different computers or computers on different versions of the code will observe different hash values. As such, aHash is not recommended for use other than in-memory maps. Specifically, aHash is not intended for network use or in applications which persist hashed values. (In these cases HighwayHash would be a better choice)

Additionally, aHash is not intended to be cryptographically secure and should not be used as a MAC, or anywhere which requires a cryptographically secure hash. (In these cases SHA-3 would be a better choice)

That leaves us with a choice on the overall objective of this crate.

Here are the possible paths I see (let me know if you have other ideas):

  1. Have crate only build on platforms where hardware instrinsics are available (this is what we have since 3.1). The crate is expected to always perform as intended, however it limits its scope of application. Eg it can't be considered as the default hasher for Hashbrown.
  2. Have an AES software fallback. The crate will work everywhere and generated hashes will be the same for a given input (stable). Drawback is performance. We can show a warning from the build.rs with println!("cargo:warning=Software fallback!!!"); but it's the best we can do to prevent non-intentional underperforming usages of gxhash.
  3. Have another algorithm as fallback. The crate will work everywhere, but we loose hash stability (do we need it?).
  4. A mix of 2 and 3: Have a software fallback + and optional feature to use another algorithm as the fallback. This way the user explicitely knows he might lose hashes stability.
notsatvrn commented 6 months ago

4 sounds like the best option. I'll try to implement AES w/o AES-NI using SIMD so the performance impact isn't as bad.

ogxd commented 6 months ago

I think it is acceptable to split the work into two parts (software fallback first, then alternative hash fallback as an optional feature). Both imply a substantial amount of work. The alternative hash fallback will require some refactoring as we currently reference AES explicitly in several places (needs to be abstracted away).

Can you share your current work on the software AES fallback? Even if not all tests are passing and performance is not optimal.

notsatvrn commented 6 months ago

Sorry for the late reply. I'll share my progress on the software AES soon, and on runtime feature detection as well as it's a bit hard to work on both separately and merge them later. All of this required some restructuring of the project, but it will make cross-platform development much easier. Also, the software AES implementation should now be stable and considerably faster, but I need to do more testing.

Veritius commented 2 months ago
  1. Have another algorithm as fallback. The crate will work everywhere, but we loose hash stability (do we need it?).

In regards to this, the reason I use gxhash is its stability. It's the only 'stable' hash I could find - most hashing crates don't document whether they guarantee stability or not. I'd appreciate some way to ensure stability, even at the cost of performance.