martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.5k stars 142 forks source link

Can I pre-hash and cache hashes to speed up lookup? #100

Closed bhaller closed 2 years ago

bhaller commented 3 years ago

Hi! This repo looks great and I'm planning to use robin_hood::unordered_node_map in my project (https://messerlab.org/slim/). I have a question (which might be a feature request); I hope this is a good place to ask it.

My usage case is that I want to use robin_hood::unordered_node_map to implement symbol tables in an interpreter. Variables in the symbol table are often defined once and then used many times (in a loop, say), so fast lookup is crucial. Symbols are defined by their std::string names, so the key type for the hash is std::string. I'd like to avoid having to recompute the hash every time the interpreted code uses a variable. Since I parse the interpreted code and turn it into an AST, I have a convenient place, in the AST node, to keep a pre-computed hash value for identifiers such as variable names. I'd therefore like to be able to pre-compute the hashes for all identifiers used in the whole AST, and then use those pre-computed hash values for both insert and find operations in the symbol table. (Of course the std::string key would still need to be passed in as well, but would be used only in case of a collision in find, and would never need to be hashed since the pre-computed hash would be supplied.)

Also, my interpreter chains multiple hash tables together, to represent nested scopes: a hash table for the local scope (such as inside a function) gets constructed and torn down very frequently, while a hash table for the global scope lives more or less forever. When a variable name is used in the interpreted code, I want to first try to look it up in the local scope, and if that fails, then check the global scope for the same key. This is another case where being able to supply a pre-computed hash value for the key would be a win, instead of computing the hash once for the local-scope lookup and then again for the global-scope lookup.

So I'm wondering (1) whether your code already supports pre-computed hashes (sorry, I'm just getting into your code, and I'm not very good with complex C++ template code, so I'm having a bit of trouble finding my way), and (2) whether, if not, this would be straightforward to add, or whether there are fundamental design obstacles in the way of doing this with your code. If it isn't there, but would be straightforward to add, I'd be happy to take a crack at it and submit a PR, if you're too busy to deal with it. :->

Thanks for this great open-source code!

bhaller commented 3 years ago

Hi again. If you find this suggestion useful, please of course feel free to run with it; for my purposes, however, I have realized that there is a better/faster design for my symbol tables, so I will not be using robin_hood::unordered_node_map after all – or not for this specific purpose, anyway. I will probably still use it to replace some other uses of std::unordered_map in my code, but the comments here won't be relevant for those uses. Feel free to close this issue.

martinus commented 3 years ago

Currently it's not possible to use a precalculated hash value. There is a proposal "Precalculated hash values in lookup" which describes such an API, but currently I don't have this: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1696r0.html

Note: I previously closed #40 which was about that because of #72, but #72 is something completely different.

martinus commented 3 years ago

It seems that proposal was removed: "Remove dedicated precalculated hash lookup interface": http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1661r1.html

bhaller commented 3 years ago

Very interesting, thanks for the link. Looks like this should probably be closed, but I'll leave that to your judgement. Thanks very much; I'm now using robin_hood for some other purposes in my code base and its performance is stellar!