Optimizing for concurrency

jeromefroe / lru-rs

An implementation of a LRU cache

https://docs.rs/lru/

MIT License

642 stars 106 forks source link

Optimizing for concurrency #21

Open rohitjoshi opened 5 years ago

rohitjoshi commented 5 years ago

lru-rs is quite fast compared to many other LRU cache implementation. Is there any way to optimize in multi-threaded access? Maybe a read-write lock or reducing the locking scope.

Maybe something like CHashmap : https://docs.rs/chashmap/2.2.0/chashmap/

get and put takes mutable self so compiler forcers to use mutex lock in a multi-threaded environment even though Send and Sync are implemented.

jeromefroe commented 5 years ago

This is a great question! Admittedly, when I first started working on this project, I intended it to be a learning experience so I didn't focus on performance very much. It would definitely be interesting to go back and revisit different parts of the cache now though. Of course, as the saying goes, "If you can't measure it, you can't improve it". So I'll probably start by writing some benchmarks which can provide a baseline against which any changes can be compared.

rohitjoshi commented 5 years ago

👍. I used xxhash which gives slightly better performance.

rohitjoshi commented 5 years ago

RocksDB's Shared cache which sharded into multiple LRU cache is faster. I did implement sharding of 128 LRU cache but still, RocksDB's performance is better. Might want to use as a reference. https://github.com/facebook/rocksdb/blob/master/cache/lru_cache.cc and https://github.com/facebook/rocksdb/blob/master/cache/sharded_cache.cc

jeromefroe commented 5 years ago

Great, thanks for the references! Hopefully benchmarking and profiling will reveal some quick wins, anything more extensive might be better left for a new crate which implements the same interface.

Firstyear commented 3 years ago

@rohitjoshi You may also find this useful https://crates.io/crates/concread this has a concurrently readable / transactional cache implementation.

rohitjoshi commented 3 years ago

@Firstyear Thanks for sharing. Earlier I saw evmap as well but both of these don't support LRU algorithm. In my use-case, existing keys are never updated and the read vs write ratio is 9-1 so trying to figure out an optimized lookup. For now, I am splitting 200M LRU capacity into 2048 shards (LRU instances) to reduce the lock.

Firstyear commented 3 years ago

@rohitjoshi If your keys are never updated, and you are mainly read to write, then you have ever more reason to look at the arcache. This design has "no" locking, allows full parallel lookups between all readers, and when you have a "cache miss" any reader can include content to the cache without blocking existing readers. For bonus, it also support SIMD for parallel key lookups via a feature + nightly rust. Additionally, ARC as a cache replacement strategy is far more effective than LRU :)

https://github.com/kanidm/concread/blob/master/CACHE.md https://docs.rs/concread/0.2.6/concread/arcache/index.html

Feel free to email me directly (rather that us annoying @jeromefroe) - it can be found on my github profile.