Closed rmccorm4 closed 1 year ago
To make sure I'm understanding correctly, the problem was if the number of keys was large, std::find would take a long time since it had to iterate over all of the keys when updating the LRU order.
The std::find was required to update the position of LRU for the current key that is being modified.
The new logic would store the location of the CacheEntry as a part of the struct to make updating the LRU faster (if the key already exists in LRU, we know the place it is stored and would go directly to that location).
@Tabrizian exactly.
Question: when we evict, we directly pop from the LRU, if we did the same on Insert - could we gurantee the iterator is always valid and avoid the need for a seperate flag? If I understand the flag is only needed once on insert -
Thanks @rmccorm4 for putting in the fix so quickly. Will this be available in next release 23.06 ?
Hi @guptap11, happy to help! Yes, this fix should be available in the 23.06 release.
Track validity of LRU iterator, and use it directly when updating an existing key for constant time updates.
Previously, the LRU logic was mistakenly doing
O(N)
search (std::find(list.begin(), list.end(), key)
) through the LRU keys for every update, which significantly slowed down cache operations as the number of keys in the cache grew.The LRU iterator logic was already there, but wasn't being used. We need a way to ensure the iterator is valid before using it to avoid undefined behavior. Rather than using a
std::pair
, I made a small struct to assign the namevalid
to the field for better readability overpair.first
/pair.second
.Refs on
std::list
iterator validity after modifying the list:I will add some kind of perf tests at increasingly larger cache sizes to verify it stays relatively flat when I add the other e2e python unit tests, but for the release the performance has been verified locally and existing functional tests all pass.