Open rurban opened 4 years ago
these results are a bit old @data-man . https://github.com/ktprime/emhash is a good place for hashmap benchmark. I stick on ska::bytell_hash_map for speed and space.
@wangyi-fudan
these results are a bit old
You can always generate new results. :)
@wangyi-fudan just FYI Martin Ankerl in the link above compared emhash and commented it like this:
If you are dealing with small maps and integral data types as keys,
emilib1::HashMap
might be fastest. But be aware that I cannot vouch for it’s stability. It is very new, and while integrating it the author still had to fix a few kinks to get it working in my benchmark. If that does not bother you, it’s search performance is very fast.If you want a more stable map with very good performance, I’d go for
absl::flat_hash_map
as the default map. It is very stable, has lots of features, and definitely well tested.
I'm more leaning to https://github.com/greg7mdp/parallel-hashmap, an improved abseil/swisstable variant, an open-addressing header-only C++ hashtable with quadratic probing. But it does it's own mixing for bad hash functions. Maybe disable that.
I'm not so sure our tested hashfuncs are actually inlined with the current HashMapTest. Definitely need to be templated with new pfhash classes, and not via std:function indirect calls.
I think it would be nice to disable quadratic probing as well. Being able to handle a pure linear probing is a classical hash function test, which probably requires a quite strong (5 independent) hash function. Multiply shift should certainly not be able to handle linear probing, but it seems to do alright in the current parallel hashtable tests.
Yes. The idea was to test it with the common default hashtable, and a good modern one. Adding a trivial linear probing or even a chained one is a good idea, because so many people are still using them.
Some would even say that if you have a good enough hash function, quadratic probing and tree-fallbacks only slow you down 😉
Others like me would argue that only the poor hash functions lead to acceptable hash table speed. Good hashes are for the weak. But it depends entirely on the usecase and the hash table impl.
Originally I also checked various other much faster linear probing hashmaps, such as e.g. from rigtorp or the two swisstable variants (Google and Facebook), but I settled with the slow standard (seperate chaining) for fair comparison purposes. Maybe add other ones, and document them.
@dumblob Speaking about currently worldwide available hashmaps, definitely skim the comparison from Martin Ankerl https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-05-conclusion/ to notice one very important thing. Namely that some hashmaps are almost not at all sensitive to collisions and throughput and latency (Throughput versus latency) of the hash function, but other hashmaps rely completely on the quality and throughput and/or latency of the hash function. To be fair in showing real use case scenarios, we would actually need to test one representative from the sensitive corner and one from the insensitive corner.
@rurban True. There should be a fast, sensitive linear probing also, besides the usual insensitive std chaining. About a robin_hood I'm not sure. It's much more stable than linear probing, and from outside just a fast chaining variant. we need to add the stddev to the mean cycles. I'll also add the initial insert/delete timings, but without trials and outlier removal. For poor hash functions add better security measures, like collision counting or tree-conversion. None of the usual hashmaps are secure, so maybe add fixed versions and test the overhead.
The goal should be to choose a good hash function, small (inlinable!) and not too many collisions, based on the timings, for slow and fast hash tables. With string keys, not numbers.
Parts taken from the discussions at https://github.com/rurban/smhasher/pull/80 and #61