Open andrewrk opened 1 year ago
A good candidate to consider is clhash. Here's table 5 from the 2015 paper by Daniel Lemire & Owen Kaser:
Note that clhash was made for x86_64. I am not sure how well it translates to Arm or other architectures, but according to the paper, these are the operations and their respective performance characteristics that led to the clhash results above:
It seems to do exceptionally well on large inputs in particular!
I am not sure what the performance landscape would look like today but pclmulqdq
is faster on some of the newer hardware (see: https://www.agner.org/optimize/instruction_tables.pdf).
Anything depending on hardware-accelerated carryless multiplication will be super slow on baseline
or any platform without acceleration.
For something more recent, komihash
is ridiculously fast, while not requiring specific CPU features.
Anything depending on hardware-accelerated carryless multiplication will be super slow on
baseline
or any platform without acceleration.
Exactly, clhash should only be considered for platforms that have a fast carryless multiply builtin.
UMash might be an improvement on the CLHash front.
For something more recent,
komihash
is ridiculously fast, while not requiring specific CPU features.
Just in case there'll be any further interest in it. I've ported komihash to Zig.
Zig hash functions have no excuse to not be the best in the industry. What are we doing with sub-par hash function implementations? Let's get serious.
For starters, the "small key" APIs are failing to branch for a small length check, such as this one:
https://github.com/ziglang/zig/blob/76aa1fffb7a06f0be0d803cb3379f3102c0b2590/lib/std/hash/xxhash.zig#L129-L133
A good implementation will branch on small lengths, like these examples:
In order to close this issue, the following things must be done:
This task may be time-consuming, but it is not difficult. If you have an example program that outperforms the zig code, then you can simply look at the machine code and notice why it is faster, then change the zig code to match it - or better yet, take those lessons and iterate on it and beat it.
Note that one strategy to make progress on this issue is to delete less useful hash functions from the standard library.
Related: Perhaps the hash API should change so that optimizations such as this can be done directly with the standard library: https://github.com/tigerbeetledb/tigerbeetle/pull/796