martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.5k stars 143 forks source link

Please add C++ 17 node API to node based map #53

Closed ned14 closed 3 years ago

ned14 commented 4 years ago

For node-based maps, these functions can enable writing much more efficient code:

https://en.cppreference.com/w/cpp/container/unordered_map/extract

Indeed, I am not using your map in a number of places precisely due to me using this node ptr API.

martinus commented 4 years ago

As you might know there is unordered_flat_map and unordered_node_map. It is not possible to extract a node from unordered_flat_map because it doesn't have nodes.

unordered_node_map uses a bulk pool allocator, so even when extracting a node is possible, its memory will be destroyed when the map is destructed. So it is not possible to extract a node and put it into another map.

So unfortunately it's not possible to support the node API, it would make a lot of other optimizations impossible.

ned14 commented 4 years ago

unordered_node_map uses a bulk pool allocator, so even when extracting a node is possible, its memory will be destroyed when the map is destructed. So it is not possible to extract a node and put it into another map.

Sure, using this functionality is only worthwhile for the node map.

I don't see from the implementation where a node's lifetime is tied to its original map. As far as I can tell, the map's destructor destroys the nodes it knows about. If a node has been extracted, it's no longer in the map. So it's fine.

A perfectly valid implementation of node_type as returned by .extract(), incidentally, is another unordered_node_map. So when you extract a node, you return a new map with a single value. When you insert a node, you simply move the node in. Implementing .merge() is exactly the same, except it takes a map with multiple items.

Unless I'm missing something in your implementation, implementing this looks straightforward? I've got a fair bit of code where there are two unordered maps, one for allocated items, the other for free items. We save on deallocation/reallocation by moving nodes from allocated to free and vice versa, it's almost the same cost as a linked list remove and insert.

In any case, thanks for making available the high quality open source implementation.

martinus commented 4 years ago

the node base map uses this bulk pool allocator: https://github.com/martinus/robin-hood-hashing/blob/master/src/include/robin_hood.h#L337

For the node based map, the map inherits from this allocator and uses it's allocate() to get memory for the node.

BulkPoolAllocator allocates big chunks of memory at once and each time a new node is created, it takes a part of that chunk. It can only deallocate all data at once, so the node's lifetime is bound to the original map.

It is ok to move the whole data around (e.g. with std::move of the whole map), but it's not possible to extract parts of it

ned14 commented 4 years ago

Would there be scope to create a map which borrows the bulk pool allocator of another map?

It would be UB to delete the second map after the first map.

Most use cases of extract and insert node are between two maps both with equal lifetime, or modifying the key of an unchanged mapped value, so this would be a fine solution.

If you like this idea, calling the function .extract_borrow() or .extract_unsafe() or something descriptive might be wise.

ArtBlnd commented 4 years ago

merging bulk memory pool to another memory pool seems not impossible. but extracting K-V from original map object can cause UB. so implement as unsafe seems reasonable.

martinus commented 3 years ago

won't do, as this would be too big of a change.