martinus / unordered_dense

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
MIT License
898 stars 72 forks source link

API extensions supporting insert/find/erase/etc. when a hash is already available #71

Closed jeremyong closed 1 year ago

jeremyong commented 1 year ago

First, thanks for the library! A key feature I've grown accustomed to (working with my own containers, as well as many specialized containers coming from game engine development) is the ability to invoke various API methods when a hash is already available. This sort of scenario is quite common, imagine you have a string identifier of some sort, and a number of structures keyed to the same identifier. Ideally, you could compute the hash once, and then reuse the result (or even cache it where appropriate) to avoid needing to re-hash the string on every operation.

When I implemented this in the past, usually it involved some light refactoring to split operations into a version accepting a hash, and a version that didn't (which would compute a hash and then forward the call to the hash-accepting version). The savings are, of course, dependent on your hash function, and the size of the input.

This API has a footgun that users need to be aware of, which is that the hash function used must match the hash function used by the container, for obvious reasons. For this reason, I recommend providing a helper function on the map/set structures to compute the hash for a given input. This takes any guesswork out of the usage, and reduces the risk of API misuse.

Cheers! Jeremy

martinus commented 1 year ago

Hi, I believe what you want is already possible since the map already has full support for heterogenous lookups. This proposal describes how to use it: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1661r1.html

See also this: https://github.com/martinus/unordered_dense#314-heterogeneous-overloads-using-is_transparent

I have not tried that yet though.

jeremyong commented 1 year ago

Thanks for the references, that's certainly an interesting approach I hadn't considered. I suppose it has the drawback of needing to wrap every key-type with a separate hashing type that permits a cached result, and baking this into the type information (preventing copies if one map has the optimization, and another doesn't, even if the underlying key and value types are the same). That said, it appears to be possible so I will give that I try thanks.