skarupke / flat_hash_map

A very fast hashtable
1.69k stars 186 forks source link

Fix flat_hash_map when used across library boundaries #26

Open smessmer opened 5 years ago

smessmer commented 5 years ago

Singletons don't work well when used across dynamic library boundaries, each library might get its own copy of the singleton. This means if library A creates a hash table and library B adds elements to it, causing a rehash and therefore deallocation, the empty_default_table() won't be recognized correctly because the two libraries have different default tables.

This PR fixes this by treating the default table like any other table, allocating and deallocating it. This causes additional allocations when the hashmap is created or set to empty, so it might not be the optimal solution, but at least it works. A potential better solution might involve nullptr and checking against nullptr. I don't have time to implement that though.

ezyang commented 5 years ago

This feels like a header-only library self-inflicted wound...

ezyang commented 5 years ago

I'm a little confused how this ever worked. Why does a hash map library need a singleton? Aren't you going to immediately mutate the "empty" table at some later point in time?

smessmer commented 5 years ago

@ezyang There are a few control flows where now, after this PR, additional allocations happen. For example, if you create a new hash map and immediately call reserve(some_large_number) on it. Before this PR, that would only have been one allocation for the large number. After this PR, it'll at first allocate the empty state (which has space for 4 elements) and then re-allocate for the large number.

cebtenzzre commented 5 years ago

Can this be adjusted to use this new behavior only on Windows, or if a certain compile-time flag is defined? As far as I understand, the existing code works just fine on Unix-like systems. Also, this needs to be done for the other two hash tables as well, as they use similar constructs.

antialize commented 5 years ago

@Cebtenzzre The existing code does not work on unix-like systems in all cases. Specificly when compiling a so file with "-fvisibility=hidden" parsing a hash map over the so boundary will result in different empty_blocks() in the different units.

We have worked around this issue by changing the line: static sherwood_v8_block * empty_block()

to: attribute((visibility("default"))) static sherwood_v8_block * empty_block()

Another less intrusive way to deal with this is to capture a pointer to the value of some empty_block() when constructing the table as a member of the table, and then alwayes use this captured pointer.

wesleyw72 commented 2 years ago

I believe this change also helps with using the flat hash map in shared memory. Previously, the pointer returned by empty_default_table is in a particular process's address space (not in shared memory). This causes issues as other processes can't access that address, and therefore can cause a segmentation fault. The change proposed in this PR fixes that, as the provided allocator is used, so as long as that allocator is a shared memory allocator - it seems to work quite well.