wvwwvwwv / scalable-concurrent-containers

High performance containers and utilities for concurrent and asynchronous programming
Apache License 2.0
306 stars 15 forks source link

Idea: Potential LRU-like Cache using 2-random sampling #94

Closed novacrazy closed 1 year ago

novacrazy commented 1 year ago

While a good, strict LRU cache is not really feasible for a concurrent data structure, an alternative method exists that could be built using hashmaps/indexmaps. The gist of the 2-random sampling method is to uniformly choose two elements at random from the entire table, and simply remove the older of the two (preferably using a trait for user-defined comparisons). It ends up being basically on-par with strict LRU in practice.

For a sharded hashmap with a good hash function, it's safe to assume most shards are of similar size, so you could sample two shards at random, then sample uniformly between those two as if they were a contiguous array. Sometimes both choices will be within the same shard, but that's fine. While scc::HashMap is not sharded, this approach may still be necessary for picking buckets instead.

If it helps come up with ideas, I once implemented this for a simple sharded hashmap, wherein I could evict many elements at once by performing a random walk around the shards, which avoided having to acquire more than one new lock per step. Still kind of expensive, but fair (I think). Though of note I did also implement a not-fair but faster approximate version. Note that my shards actually needed to be similar to an IndexMap to access by index.

I think it would be awesome if scc could provide some kind of HashCache eventually built around this, as I don't think I'm aware of any high-performance thread-safe LRU caches without tons of locks.

Not a high priority, though. I personally have no need for it right now.

wvwwvwwv commented 1 year ago

Great idea! I'll anyway have to implement an LRU cache for a project at work, and the suggested algorithm seems to be good enough for most cases. Maybe next month? I'll try to implement something.

wvwwvwwv commented 1 year ago

Minimum viable implementation will be out in a couple of weeks. -> Super simple algorithm: the least recently used entry in a bucket will be evicted if the bucket is full; instead of allocating linked lists, the LRU entry in the bucket will be replaced with the new key. -> Public evict* methods; will not be included in the first version; TODO. -> HashCache will be normally resized just like HashMap as long as the capacity is under a user-specified limit.

wvwwvwwv commented 1 year ago

Hi, the initial version of HashCache was uploaded in v1.7.0.

It only contains a minimal set of API methods, and no performance tests were conducted, so I'm not sure if it is seriously usable. Therefore, if you find any problems, feel free to open new issues!