rails / solid_cache

A database-backed ActiveSupport::Cache::Store
MIT License
890 stars 63 forks source link

SIEVE eviction algorithm support #129

Closed bessey closed 9 months ago

bessey commented 10 months ago

This crossed my mind off the back of reading through https://github.com/rails/rails/issues/50443 and in particular @byroot's concerns of SolidCache relying on FIFO for evictions.

Doing the rounds on HN recently is a new algorithm for cache invalidation called SIEVE. Compared to FIFO it significantly improves cache hit ratios. Compared to LRU it significantly reduces cache write frequency. Most importantly, its a very simple algorithm that in the paper is laid out in just 16 lines of pseudo-code.

I thought those qualities sounded like rather a good fit for solving the concerns laid out in https://github.com/rails/rails/issues/50443. This really isn't my area of expertise, but I've spent some time taking a stab at sketching out what this algorithm might look like in SolidCache with minimal changes to the existing structure.

I have not yet got as far as actually implementing it and measuring performance. Before I go that far, is there appetite for such a change / addition to the gem?

byroot commented 10 months ago

This algorithm assumes fast access to the entry metadata.

What you implemented in AR/SQL will locks lots of records for prolonged amount of time in a long running transaction. That would be terrible for performance.

bessey commented 10 months ago

Yep fair. The only purpose of the outer transaction though is to ensure no 2 eviction processes are running (and trying to move the pointer) at once.

With that in mind, an advisory lock would probably be a better option.

djmb commented 10 months ago

Hi @bessey, thanks for the suggestion!

The SIEVE algorithm looks like its designed mainly for in-memory caches with a single writer and fixed size items. As such I'm not sure that would translate well to Solid Cache.

It would require:

Maybe there's an adaptation that would allow multiple processes to run at once, but I suspect it would no longer look so simple.

The bet with Solid Cache and FIFO is that the inferior algorithm doesn't matter so much when your cache is large and items are long lived. I'm not sure we'd see hit rate improvements similar to what they found in that paper. It would certainly need investigation first.

Finally by using FIFO and deleting records sequentially we reduce fragmentation in records and on disk. Reducing disk fragmentation means we can store more records and reducing record fragmentation allows us to quickly get a rough estimate for the number of cache entries by subtracting the max and min IDs.