martinus / unordered_dense

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
MIT License
898 stars 72 forks source link

Frozen hashes/sets #19

Closed vstakhov closed 2 years ago

vstakhov commented 2 years ago

Is your feature request related to a problem? Please describe. It is more a discussion, than a real issue or feature request. I'm looking for quite an unusual thing - an ability to freeze a hash or a set in some shared memory piece, so other processes can access this data structure from shared memory concurrently but with no ability to change data. So I'm looking for any hints in this direction about the potential design. I'd be also fine if that will not come to the upstream as a rarely needed feature but in my case it can significantly reduce memory footprint especially in highly loaded systems. And since no modifications are possible, there is no need in locking or other consistency support - processes can just read data from the shared memory.

Describe the solution you'd like I would like to have an interface that stores data in a given shared memory with an ability to restore data afterwards. The most simple approach is to copy the underlying vector into shared memory, and create a structure directly in shared memory with placement new and set the storage pointer to the vector allocated/copied previously. This is quite a low level approach, so I'm wondering if there is something better.

Describe alternatives you've considered In my case, the alternative is to have a separate hash table for each process which is just a waste of memory.

P.S. please feel free to close if this is not relevant to the project.

martinus commented 2 years ago

If I understand you right, you want to allocate shared memory data somehow and then have the map inside this shared block of data? This sounds like a job for a custom allocator. If your compiler supports it, I think the easiest way to do this would be to implement an std::pmr::memory_resource that actually allocates a block of data in shared memory and then gives out pointers to this shared memory as needed.

Have a look at test/unit/pmr.cpp. Here is a short example that will use logging_memory_resource and log exactly two allocations, one for the vector, and one for the indexing struct:

    auto logging_resource = logging_memory_resource();
    auto map = ankerl::unordered_dense::pmr::map<uint64_t, uint64_t>(&logging_resource);
    map.reserve(100); // <-- 2 allocations done here
    for (uint64_t i=0; i<100; ++i) {
        map.try_emplace(i, i);
    }

You could modify logging_memory_resource to instead provide pointers to shared memory.

bigerl commented 2 years ago

You could also have a look at

Both solutions have a downside though: You would need to replace the std::vector used by ankerl::unordered_dense internally with boost::container::vector. The container types from the standard library (at least libstdc++) do not comply to custom (fancy) pointer types defined by allocators and use raw pointers instead. Both, the metall allocator and the boost interprocess allocator use offset pointers not raw pointers. So it would need some tinkering :)

martinus commented 2 years ago

Both solutions have a downside though: You would need to replace the std::vector used by ankerl::unordered_dense internally with boost::container::vector.

I'm working on a branch where it is possible to customize the container with a simple template argument. It's not yet finished though.

Philippe91 commented 2 years ago

I'm working on a branch where it is possible to customize the container with a simple template argument. It's not yet finished though.

Great... I would not have dared to ask for this :)

martinus commented 2 years ago

In #20 I just merged the option to use custom containers. The Allocator template argument is now AllocatorOrContainer and this accepts a container type now too. E.g. this works:

using Map = ankerl::unordered_dense::map<
    int,
    std::string,
    ankerl::unordered_dense::hash<int>,
    std::equal_to<int>,
    std::deque<std::pair<int, std::string>>>;

I've got a test in custom_container_boost.cpp that makes use of shared memory. This little example works on my machine, but I didn't thoroughly test it.

martinus commented 2 years ago

Released 1.3.0 with plenty of fixes, new API, and also the AllocatorOrContainer support.