yeshm / memcached

Automatically exported from code.google.com/p/memcached
0 stars 0 forks source link

no way to get without bumping LRU, aka "peek" #372

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I have a very (very) large application which uses memcached to cache complex 
objects that are slow to assemble from multiple databases, some of which are 
edited by external processes without notification.

I also have a lazy scan process which slowly loops through every possible 
object, recomputes the object, and compares it to what is in memcached. If 
there is a mismatch (likely due to an external edit), it is reported and then 
the memcached entry is replaced.

So far, so good! However, this lazy scan totally defeats memcached's LRU 
eviction strategy, because my lazy scan has to "get" every possible key in a 
fixed order.

What is missing is a way to perform a "peek" that retrieves the data from 
memcached but does not actually count as a "get", so that I can make sure the 
data is still valid without re-ordering the LRU.

Original issue reported on code.google.com by hgof...@gmail.com on 21 Jul 2014 at 6:50

GoogleCodeExporter commented 8 years ago
IMO, you are fixing the wrong problem. In our stack, there is only one object 
that reads/writes a cache item. This controls the state of the cache. Therefore 
we have no need to fix the cache. Shouldn't you fix that problem? You are want 
memcached to allow your application logic to be messy.

Original comment by bri...@dealnews.com on 21 Jul 2014 at 7:24

GoogleCodeExporter commented 8 years ago
The data comes from external sources. Computing a new version or namespace for 
every possible data change is impossible. We don't care if the data is hours 
out of date, but we need it to be correct within a day or two, and we need to 
cache the expensive computations which are performed on the data (which may or 
may not change at any given moment).

We cannot afford to do validation work on every cache hit, nor can we afford to 
expire cache items pre-emptively, as we have over 500 GB of data in memcached 
and at most only 1% of this data really needs to be recomputed. A lazy scan is 
the best possible architecture for our needs.

Note that memcached itself has already implemented this internally, as part of 
the new LRU scanning process that finds expired items. It is clearly a useful 
and important ability to query without changing LRU order.

It is just a missing feature in the protocol: "peek", "inspect without changing 
state", "get without LRU touch", whatever you want to call it. This is an 
important feature for any cache.

Original comment by hgof...@gmail.com on 21 Jul 2014 at 8:08

GoogleCodeExporter commented 8 years ago
peek is probably useful, but I'm not sure when we'd get to it.

"If I were you", I'd embed the creation time in the stored object. the older 
the object is, the more often it tries to set some flag or log some line to get 
itself rechecked.

Or at least some similar process wherein decently active data has a higher 
chance of being recalculated early, and longer-tail less-hit data gets booted 
on its own.

You can also leave "hints" in extra cache keys. For example, if an object is 
created from three different sources, this makes it difficult to give it a 
single namespace to invalidate it by.

So when an edit comes in for resource A, you add/set an extra key that 
signifies "Source A has been updated" - when fetching the objects, you do a 
multiget to grab the original object and any of these flag-keys if they exist. 
If they do exist you can make a decision to recompute on the fly, 
asynchronously schedule an update, etc.  Also potentially factoring in the age 
of the object.

You can also extend the LRU crawler to help with this, potentially.

Usually folks just try to not structure objects so far away from the edits that 
you can't authoritatively say what affects which data. It's a known limitation 
of distributed cache stores like this (without grossly slowing things down). If 
you're stuck with it you have some options at least (LRU crawler/etc).

Original comment by dorma...@rydia.net on 15 Aug 2014 at 3:02