martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.5k stars 142 forks source link

question about 3.9.0 #101

Closed jmbnyc closed 3 years ago

jmbnyc commented 3 years ago

I had a question about the latest changes to the hash functions (3.8.0 vs 3.9.0). Can you elaborate on the changes and their potential performance impact (or direct to me where this is discussed). Your work here is wonderful but your commit comments are not very helpful wrt understanding the changes that you are making (it would be nice if you added a description of what you have changed and why it was changed otherwise it is hard to know the motivation unless it is linked to an issue).

It is easy to see that you removed CRC stuff probably because of the maintenance hassle but I'd like to understand the impact on performance (cost of hashing and distribution impacts).

Thanks.

martinus commented 3 years ago

Hi jmbnyc, actually the hash has not changed between 3.8.0 and 3.9.0. hash_bytes and hash_int are still the same. In between I experimented with CRC but got rid of the changes because of maintenance hassle, and also because I can't benchmark my changes properly on the different machines. On my intel machine CRC implementation was faster than what's in it now, but I had reports that it was slower on their end.

What actually has changed is that I now use the hash a bit differently, I use the lower few bits for the info byte. There shouldn't be much effect on runtime though, and this works better when the higher hash bits are not random enough.

jmbnyc commented 3 years ago

Martin, Thanks for the quick response. You are correct, I forgot that I had grabbed the HEAD 3 weeks ago and then I called it 3.8.1 because I saw that 3.8.0 was the last tag.

Thanks for the explanation of how you use the bits - that works great for all aligned pointers where there is no existence of entropy in the lows bits, e.g. they are all zero.

BTW, I've added a few API's to your core map/set because I find them more useful than the std defined API. I added find per blow and a similar one for erase where I want to get a copy of the value and have erase indicate to me if it was erased (that one can actually use move to obtain the value).

e.g. // JMB Change 6 Oct 2020 // Added conditional complilation to allow "set" and "map" to compile template bool find(const keytype& key, MappedType& value) const { if (is_map) { // NOLINT(modernize-use-nodiscard) ROBIN_HOOD_TRACE(this); auto kv = mKeyVals + findIdx(key); if (kv == reinterpret_cast_no_cast_alignwarning<Node*>(mInfo)) { return false; } value = kv->getSecond(); return true; } return false; }

Again, thanks for the excellent work you have produced.

martinus commented 3 years ago

Hm, I personally find the standard API pretty sufficient. E.g. I like this idom, which is pretty similar to your sample code:

    if (auto it = map.find(key); it != map.end()) {
        std::cout << "found " << it->second << std::endl;
    } else {
        std::cout << "not found" << std::endl;
    }