jonhoo / griddle

A HashMap variant that spreads resize load across inserts
Apache License 2.0
190 stars 7 forks source link

Small linear factor remaining #3

Open jonhoo opened 4 years ago

jonhoo commented 4 years ago

Even though griddle spreads out most of the cost of the resize, there is still a non-trivial additional cost at the time of the resize that appears to be proportional to the size of the map. That's unfortunate, and we should try to fix it.

I believe this is due to the code here: https://github.com/jonhoo/griddle/blob/b2a063c9efb82907447d92dad88015314379402c/src/lib.rs#L775-L782

Specifically, the first time we try to carry elements from the old map to the new, we need to find the first non-empty bucket, which may actually take a while as the map grows. I wonder if hashmap could somehow keep track of the index of the first non-empty bucket?

Amanieu commented 4 years ago

Hashbrown iterators work in groups of 16 elements, so finding the first full bucket takes constant time if it is within those first 16 elements. Since the load factor is 87.5%, you would need a very large table for for this to become an issue.

jonhoo commented 4 years ago

Yeah, it only really becomes visible around ~500k elements: https://github.com/jonhoo/griddle/blob/master/misc/vroom.png

It also increases as the table grows, although even at ~1.7M element the time it takes is pretty small. I more raised it as it seems to be the only remaining operation that takes time proportional to the size of the map.