martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.53k stars 147 forks source link

FYI: We build a library around your hash #110

Closed bigerl closed 3 years ago

bigerl commented 3 years ago

Hi @martinus, this seemed to me to be the easiest way to reach out to you.

We at the DICE research group found that the hash functions for strings and int you provide in this repo provide among the best performance with all hash maps/sets we tested. So we built a hashing library around it that provides predefined hashes for primitive types, many stl containers and an easy way to define hashes for user defined types.

Please check it out at https://github.com/dice-group/dice-hash

Best Alex

martinus commented 3 years ago

Hi! I had a quick look at your code:

for dice_hash_invertible_combine you might want to have a look at https://www.preprints.org/manuscript/201710.0192/v1 The polynomial 3860031 + (h+y)*2779 + (h*y*2) looks good and fast enough

Note that the hashes I am using in robin_hood map are not really high quality, but they seem to be good enough (and fast!) for hashmaps. I wouldn't use them for anything else.

bigerl commented 3 years ago

Thx for the hint regarding the invertible combine. We will definitely look into that.

To get you right on the hash quality: Are you sure that hashes are not of good quality, e.g. with respect to quality benchmarks like SMHasher or theoretical properties? Or was that simply never investigated (probably because it is not necessary for a hash map)?

martinus commented 3 years ago

I've spent quite some time especially on hash_int. E.g. one good test is to hash consecutive numbers (0, 1, 2, 3, ...), and put the result through PractRand. For a good high quality mixer, this should not fail quickly. For my hash_int version I am pretty sure this fails tests for randomness quite quickly. But as far as I can say this shouldn't be a big concern for hashmaps, it does not necessarily have to be of such a high quality.

I've got a separate project to find a good mixer here: https://github.com/martinus/better-faster-stronger-mixer

I'd say one of the highest quality mixer is NASAM: http://mostlymangling.blogspot.com/2020/01/nasam-not-another-strange-acronym-mixer.html Pelle Evensen's blog is a really good read for that.