greg7mdp / parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers.
https://greg7mdp.github.io/parallel-hashmap/
Apache License 2.0
2.47k stars 234 forks source link

Question: How does `find_impl` work if the table is pretty full? #166

Closed bashimao closed 2 years ago

bashimao commented 2 years ago

Hi @greg7mdp ,

bool find_impl(const key_arg<K>& key, size_t hashval, size_t& offset) {
        auto seq = probe(hashval);
        while (true) {
            Group g{ ctrl_ + seq.offset() };
            for (uint32_t i : g.Match((h2_t)H2(hashval))) {
                offset = seq.offset((size_t)i);
                if (PHMAP_PREDICT_TRUE(PolicyTraits::apply(
                    EqualElement<K>{key, eq_ref()},
                    PolicyTraits::element(slots_ + offset))))
                    return true;
            }
            if (PHMAP_PREDICT_TRUE(g.MatchEmpty()))
                return false;
            seq.next();
        }
    }

may I ask a couple of questions:

a) Is my understanding correct that (assuming SSE impl: kWidth = 16):

  1. You match the next 16 slots (via control byte) starting from the first offset of the probe sequence.
  2. If there was a hit, you do a full comparison.
  3. If not you check, if all 16 slots were empty. If so, we can quit right away.
  4. Otherwise, use probe to change offset according to quadratic probing algorithm specified in probe_seq.next().

b) However, isn't (3) highly unlikely if the table pretty full (>80%) and has lots of tombstones?

Wouldn't this algorithm result in an infinite loop? How is that prevented from happening?

greg7mdp commented 2 years ago
  1. If not you check, if all 16 slots were empty. If so, we can quit right away.
  1. If not you check, if any of the 16 slots was empty. If so, we can quit right away.
bashimao commented 2 years ago

Oh, my bad. Thanks. I must have misread the code.