martinus / unordered_dense

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
MIT License
898 stars 72 forks source link

New Benchmark + Docs #24

Closed patlecat closed 2 years ago

patlecat commented 2 years ago

I'm eagerly following your new lib. I hope you will explain more soon how it improves on all your previous hash_map libs, I'm not clear about this yet.

Also I hope you will make a new benchmark round again like in 2019, this was awesome and gave so much insight in all the libs out there.

Thanks :)

martinus commented 2 years ago

It mostly improves on my old implementation by much more likely to be bug free 🙂 robin_hood still has the chance of overflow with bad hash, even with some mitigations. This new map practically doesn't have that issue. By splitting up it's internal structure into an index and a data vector the code became much simpler, and I'm now able to easily support custom allocators.

greenrobot commented 2 years ago

robin_hood still has the chance of overflow with bad hash

Are there details available on this? E.g. when does it hit and what exactly happens then (e.g. exception thrown)?

martinus commented 2 years ago

@greenrobot It can happen when >127 entries per bucket are used. If that happens robin_hood tries to mitigate the problem by increasing the map's size or changing the hash. If that doesn't help, an std::overflow_error is thrown. It never happened for me, but I had user reports that this has happened.

With unordered_dense there can be 16.7 million entries per bucket.

greenrobot commented 2 years ago

Thanks, so this also highly depends on the quality/flaws of the hash function; assuming that there's one bucket per hash(?).

martinus commented 2 years ago

yes it depends on the hash.

patlecat commented 2 years ago

@martinus Ok thanks. Is that the only difference? So has the performance also improved in comparison to the old lib? And what about rerun of your benchmark? Would love to see the current state of things.

martinus commented 2 years ago

@patlecat I'll release a completely redone benchmark in a few days.

patlecat commented 2 years ago

@patlecat I'll release a completely redone benchmark in a few days.

omg omg omg Awesome man, thanks so much!

greenrobot commented 2 years ago

yes it depends on the hash.

Sorry, one more questions: any prediction on how uint64_t keys behave, that are incremented monotonically?

martinus commented 2 years ago

@greenrobot depends a lot. E.g. if it increases by 1, then just use a vector with the key as an index. If it doesn't, use a proper hash.

martinus commented 2 years ago

@patlecat @greenrobot I've finally published my big benchmark update: https://www.reddit.com/r/cpp/comments/x82kac/comprehensive_c_hashmap_benchmarks_2022/?

greenrobot commented 2 years ago

@martinus Wooooow! Amazing work! I'm eager to play with it now... :grin: