skarupke / flat_hash_map

A very fast hashtable
1.69k stars 186 forks source link

Corner case issue when hash collision with bad hashing #45

Closed Philippe91 closed 2 years ago

Philippe91 commented 2 years ago

Each time there is a hash collision, the set doubles its size (same problem with the map) See the simple sample code below. note: no such problem with ska::unordered_set

void main()
{
    struct hash_t
    {
        size_t operator()(int) const { return 1; };
    };
    ska::flat_hash_set<int, hash_t> set;

    for (int i = 0; i < 30; ++i)
    {
        std::cout << set.bucket_count() << std::endl;
        set.insert(i);
    }
}

Printout: 4 4 8 8 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728

Philippe91 commented 2 years ago

After a new look at this issue, I realize that my previous test code is really hitting a corner case that does not happen in real life, provided the hash function is not terrible. This being said, we can considerably reduce the growth rate for such corner cases by changing the method 'compute_max_lookups' Just multiply 'desired' by 2, like this:

int8_t desired = detailv3::log2(num_buckets) * 2;