martinus / unordered_dense

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
MIT License
935 stars 75 forks source link

Question about how `insert` works #90

Closed nicklegr closed 1 year ago

nicklegr commented 1 year ago

I am reading your source code. It is very cleverly designed and interesting. Great work!

I have one question. In your implementation of insert, the expression dist_and_fingerprint <= rhs.m_dist_and_fingerprint appears. This works the same as the general Robin Hood hashing when dist is different. However, when dist is equal, it may be inserted before the intended position, depending on the value of fingerprint. Is there any benefit to this behavior other than simplifying the implementation?

I understand that this works correctly. For insert it is dist_and_fingerprint <= rhs.m_dist_and_fingerprint, for find it is dist_and_fingerprint > rhs.m_dist_and_fingerprint, which is symmetrical.

martinus commented 1 year ago

Hi! The benefit is that I don't have to check each element when distance is equal, I only have to check until the fingerprint is too large. So when you have e.g. 10 elements that hash to the same bucket and I want to find an element that hashes to the same bucket but is not there, on average I only have to check 5 of them (until dist_and_fingerprint is too large). Without that sorting, I would have to compare all 10 fingerprints.

As far as I know no robin hood hashing except my implementations use this behavior.

nicklegr commented 1 year ago

I see! I understand it well.

I think there is a slight performance penalty to insert by inserting the element at a slightly more forward position than intended. But if find is faster, I think it's a good trade-off. At least in my application, the performance of find is more important.

Thank you for your clear answer!