Closed skatkov closed 10 months ago
We need a database agnostic way of doing this and there's no way to get the table size like that. Also at least for MySQL, the table size wont be reduced unless you re-write the table so even if you can get it, it is useless as a measure for cache size.
We could store the total size of the cache values in a separate table, but that's got issues:
So I don't think that's a valid option.
A second approach is to store the size as a separate column on each row of the solid_cache_entries
table. We could get the size by summing the size column. However that query would be much too slow once the cache gets beyond a certain size (e.g. on Basecamp we have about 900 million rows).
We could try to sample the size column to generate an estimate. If we assigned each row to a random bucket and created an index on bucket
, size
we could quickly read all the sizes from one or more buckets and generate an estimate from that.
I think that will quite often give a fairly accurate estimate. However large sized outliers could be a problem. If we also indexed the size column, we could read the N largest records and use that to adjust our estimate. Will need some thought into how we do that exactly.
This probably needs some testing to see how accurate the estimate is across different sets of sizes.
There's a couple of other things to note here:
max(id) - min(id)
, so we can guess how many buckets to query in one go to get N records back.
To better support constrained environments with low storage for DB's (e.g. Heroku), it would be nice to constraint Solid Cache by size of cache table to ensure that cache will not consume all available space in database.
DHH mentioned this here: https://github.com/rails/rails/issues/50443#issuecomment-1871279699