martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.52k stars 146 forks source link

it = map.erase(it) potentially passes object twice #42

Closed martinus closed 4 years ago

martinus commented 5 years ago

It seems @jcageman has stumbled uppon an issue that probably all robin-hood map with backward shift deletion have (see #18). Here is a simple reproducer

Insert two elements, where both hash to the last bucket of the map. One entry will be the last one of the map, the other one will wrap around and use the first bucket.

robin_hood::unordered_node_map<int, int, DummyHash> map;
map[1024] = 1;
map[2048] = 2;

it = map.begin(); // it points to 2048, the first element which has wrapped around
++it;  // it points to the last element, 1024 which is in its original bucket
it = map.erase(it); // backward shift deletion removes 1024 and wraps back 2048, so it now (again!) points to 2048 and NOT end
it = map.erase(it); // finally we are at the end.

Also see https://github.com/Tessil/robin-map/issues/20

ktprime commented 5 years ago

The iterator pos must be valid and dereferenceable. Thus the end() iterator (which is valid, but is not dereferenceable) cannot be used as a value for pos.

martinus commented 5 years ago

@ktprime, this is not about the end() iterator, this is about the backward-shift-deletion which shuffles elements around when erasing an element. See https://github.com/Tessil/robin-map/issues/20

ed-lam commented 4 years ago

Any progress on this or any workaround? If not, is there any alternative unordered_map that you recommend? Thanks.

martinus commented 4 years ago

Unfortunately, not yet. This is an edge case that's very difficult to solve in an efficient way. If you need the standard behavior, switch to std::unordered_map.

martinus commented 4 years ago

I think of all the possible solutions to this problem, the least bad is to ditch the wrap around and add a buffer to the end of the data array.

Since mInfo is limited, The size of that buffer can also be very limited: std::min(mMaxNumElementsAllowed, 128).

martinus commented 4 years ago

I've finally fixed the issue. As explained above, I've added a buffer for the overflow, so there's no wrap around any more. The added benefit is that a lot less & mMask are necessary. The disadvantage is that for the worst case of bucket size 256, I need an additional 256 buckets. That's not too bad though. For larger maps I still only need 256 extra buckets, and for small maps the buffer is also smaller.