jonhoo / griddle

A HashMap variant that spreads resize load across inserts
Apache License 2.0
190 stars 7 forks source link

Sub-optimal performance on remove during resize #1

Closed jonhoo closed 4 years ago

jonhoo commented 4 years ago

When a remove happens during the resize period, we need to restart the RawIter we have over the old table to track which elements have yet to be removed. This restart happens here: https://github.com/jonhoo/griddle/blob/b2a063c9efb82907447d92dad88015314379402c/src/lib.rs#L865-L871

Unfortunately, when the iterator is restarted this way, it has to walk all the empty buckets of elements we have already moved the next time we try to move a key. When the old map is large, this can take a significant amount of time.

It would be better if we could "refresh" the iterator instead. This should be possible, since we know the old table has neither shrunk nor grown. I probably requires some changes internally in hashbrown. I pinged the hashbrown author about it in https://github.com/rust-lang/hashbrown/pull/166#issuecomment-651446241.

Amanieu commented 4 years ago

Erasing an element doesn't actually invalidate a RawIter as long as the element you removes is "behind" the iterator. This should probably be documented better, but if you look at retain you can see that is how it works.

jonhoo commented 4 years ago

Ah, I think maybe some context is needed here.

The iterator griddle keeps is an iterator over the "old" table of elements that have not yet been moved. We keep that iterator around even while other operations are performed (like removals) so that the next insert can immediately get the next item to move without having to scan for non-empty buckets. There are no elements left "before" the iterator, as they have all been moved.

The issue here is that the user calls remove on the top-level map, which then ends up removing an upcoming element from the cached iterator (potentially even in its current group). At the very least, the items counter in the RawIter needs to be decremented, but I also experienced that the cached iterator would yield empty buckets for the items that have been removed since when it was created. Basically, this code: https://github.com/jonhoo/griddle/blob/91b0f5b1b08c87834d7e36b3aa8230a46fbeaddc/src/lib.rs#L783-L787

Triggered this assertion: https://github.com/rust-lang/hashbrown/blob/efbd03661fc9df0e594ecc89784107353cc71060/src/raw/mod.rs#L491

jonhoo commented 4 years ago

I think a "general" refresh is probably too tricky to add (e.g., you wouldn't know how to change items without re-scanning), but maybe a notify_remove_ahead method that decrements by 1 and refreshes the current iterator group?