munificent / craftinginterpreters

Repository for the book "Crafting Interpreters"
http://www.craftinginterpreters.com/
Other
8.75k stars 1.03k forks source link

Why to go until to an empty bucket in the hash table instead of just stop at a tombstone? #1005

Closed samolisov closed 2 years ago

samolisov commented 2 years ago

Hello Robert! Thank you for the amazing book!

I'm reading the virtual machine chapter, the section about hash tables. I see your implementation of the delete operation when you suggest to replace the being deleted entry with a tombstone. Then you re-implement the findEntry() operation where linear probing goes up to an empty bucket but remembers it has seen a tombstone somewhere and returns the tombstone if it was visited during the finding operation. So, your conclusion is the following: we require an empty bucket to finish the findEntry() operation otherwise we will enter in an infinite loop.

I have a question (sorry if issues is not the right place for questions): why cannot we finish just when the first tombstone is detected? We can just return the tombstone and not going further to the first empty bucket. There are not many words in Cormen et al, Section 11.4 Open Addressing about the delete operation: "We can solve this problem by marking the slot, storing in it the special value DELETED instead of NIL. We would then modify the procedure HASH-INSERT to treat such a slot as if it were empty so that we can insert a new key there". The phrase "to treat such a slot as if it were empty" can have many meanings but I understood it as "just return the DELETED slot once you have find it" but now I'm not sure.

Thank you for the book again and for the answer.

munificent commented 2 years ago

why cannot we finish just when the first tombstone is detected? We can just return the tombstone and not going further to the first empty bucket.

There are two reasons we might be calling findEntry():

  1. We are looking up an item in the hash table. In this case, we have to keep looking past the tombstone because the entry may actually be there after the tombstone. The tombstoned element could be part of a different probe sequence that is interleaved with the one that we are looking for, so the presence of a tombstone does not necessarily mean our element isn't there.

  2. We are inserting an item in the hash table. Here, you might think we could just stop as soon as we find a tombstone because we don't need to find the existing item anyway. But we actually do. A hash table guarantees that there is only one element with a given key. If we are inserting an item, we need to find an existing item with the same key so that we can replace it. So even in this case, we need to keep looking past the tombstones.