Open GoogleCodeExporter opened 8 years ago
IMO, you are fixing the wrong problem. In our stack, there is only one object
that reads/writes a cache item. This controls the state of the cache. Therefore
we have no need to fix the cache. Shouldn't you fix that problem? You are want
memcached to allow your application logic to be messy.
Original comment by bri...@dealnews.com
on 21 Jul 2014 at 7:24
The data comes from external sources. Computing a new version or namespace for
every possible data change is impossible. We don't care if the data is hours
out of date, but we need it to be correct within a day or two, and we need to
cache the expensive computations which are performed on the data (which may or
may not change at any given moment).
We cannot afford to do validation work on every cache hit, nor can we afford to
expire cache items pre-emptively, as we have over 500 GB of data in memcached
and at most only 1% of this data really needs to be recomputed. A lazy scan is
the best possible architecture for our needs.
Note that memcached itself has already implemented this internally, as part of
the new LRU scanning process that finds expired items. It is clearly a useful
and important ability to query without changing LRU order.
It is just a missing feature in the protocol: "peek", "inspect without changing
state", "get without LRU touch", whatever you want to call it. This is an
important feature for any cache.
Original comment by hgof...@gmail.com
on 21 Jul 2014 at 8:08
peek is probably useful, but I'm not sure when we'd get to it.
"If I were you", I'd embed the creation time in the stored object. the older
the object is, the more often it tries to set some flag or log some line to get
itself rechecked.
Or at least some similar process wherein decently active data has a higher
chance of being recalculated early, and longer-tail less-hit data gets booted
on its own.
You can also leave "hints" in extra cache keys. For example, if an object is
created from three different sources, this makes it difficult to give it a
single namespace to invalidate it by.
So when an edit comes in for resource A, you add/set an extra key that
signifies "Source A has been updated" - when fetching the objects, you do a
multiget to grab the original object and any of these flag-keys if they exist.
If they do exist you can make a decision to recompute on the fly,
asynchronously schedule an update, etc. Also potentially factoring in the age
of the object.
You can also extend the LRU crawler to help with this, potentially.
Usually folks just try to not structure objects so far away from the edits that
you can't authoritatively say what affects which data. It's a known limitation
of distributed cache stores like this (without grossly slowing things down). If
you're stuck with it you have some options at least (LRU crawler/etc).
Original comment by dorma...@rydia.net
on 15 Aug 2014 at 3:02
Original issue reported on code.google.com by
hgof...@gmail.com
on 21 Jul 2014 at 6:50