tommyettinger / waterhash

Variant on Wang Yi's wyhash with 32-bit output, using at most 64-bit math
The Unlicense
25 stars 1 forks source link

64-bit variant? #3

Open VioletGiraffe opened 1 year ago

VioletGiraffe commented 1 year ago

Since waterhash is internally 64 bits wide and is only truncated to 32 at the very end, has anyone tried running SMHasher on a 64-bit version - the same waterhash but without the final shift and subtract with truncation to uint32_t? Would it pass the tests same as the 32-bit version does?

Is woothash what I'm looking for? It has quite a bit more code so I'm confused about how similar they are in essence. woothash is faster, but has slightly worse bias (not a meaningful difference, but still. In fact, waterhash's bias is the lowest I've seen between all the hash functions I've looked at yet).

tommyettinger commented 1 year ago

You probably want wheathash, which is almost exactly what you said (waterhash modified to output 64 bits), other than a slightly more-involved final step when compared to waterhash. Waterhash and wheathash both process inputs in 32-bit blocks, whereas woothash processes 64-bit blocks and needs to work harder to mix them without access to 128-bit math. The speed on all of these depends a lot on your environment; I have ported these all to Java and the water/wheat/woot family is typically my go-to these days for non-cryptographic hashing in Java.

VioletGiraffe commented 1 year ago

Thanks a lot! For the explanation, and for your work. How do you choose between water / wheat / woot, and especially between the last two?

tommyettinger commented 1 year ago

Wheat is meant for inputs in 32-bit chunks, which doesn't matter here much, but does in languages like Java that hash arrays by item rather than by byte. Woot is meant for inputs in 64-bit chunks, like the original wyhash. In my Java code (which is here), I use waterhash when hashing 32-bit-or-less items to get a 32-bit result, wheathash when hashing 32-bit-or-less items to get a 64-bit result, and woothash when hashing 64-bit items to get any result. In C or C++, where I admit I have never used a hash table (I only write hashes in C or C++ to test them with SMHasher), hashing just operates on bytes, so woothash would probably be best for speed. For collision resistance, I don't really know if any of these qualify as resistant, but woothash has a different variety of mum()-like function, and that may make it better or worse.