hapijs / catbox

Multi-strategy object caching service
Other
494 stars 72 forks source link

Ability to flush/invalidate the cache (by segment or in full) #142

Closed PizzaBrandon closed 4 years ago

PizzaBrandon commented 9 years ago

In an upcoming project launch, we're planning on using a relatively long cache TTL to improve average response times on our API, but I need to invalidate (all or part) the cache on demand.

For example:

If product's price changes, any page (especially on search) that had that product displayed needs to be invalidated, but I don't necessarily know every key where that product may have appeared. Therefore, I need to delete all search pages from cache, but there are other classes of pages I can leave alone.

Ideally, it'd be great if I could delete all keys in a partition, a segment, or a any key starting with a given substring so I could limit the "damage" to as small of a keyspace as necessary.

hueniverse commented 9 years ago

This would require every catbox strategy to implement this feature which would be somewhat hard to achieve. It will need to be implementation specific which makes it harder (e.g. the catbox layer doesn't have any idea which keys are present). I am not sure this is a practical goal at this point but if you want to try and get some of the most common catbox strategies to add such feature (e.g. some kind of criteria-based drop()), give it a try.

arb commented 9 years ago

I also needed something like this and the only thing I could come up with at present was keep a list of all the keys in memory and iterate over them and call drop.

PizzaBrandon commented 9 years ago

We're using Redis for our cache, so I'm going to end up directly interfacing with its flush through other means for now because of time limitations. I will look into how this could be done through catbox and the related plugins.

I found some possible solutions for range deletes in Redis using its EVAL command, but I haven't looked into the capabilities of other tools that catbox can interact with.

hueniverse commented 9 years ago

There is a potential simpler middle ground here. I can add a generation count to the object we store in the cache. If you want to flush something, it just increments the generation for a given partition or segment. Then when you do a get() it checks it and if the generation is wrong, it returns not-found (and drops the value from the cache). This doesn't flush the actual cache storage but will get the desired result.

arb commented 8 years ago

Is this something that is on the radar or has there not been enough interest in this feature?

devinivy commented 8 years ago

I would be interested in this feature using Eran's suggested strategy.

hueniverse commented 8 years ago

I'll take a pull request as long as it's non-breaking.

devinivy commented 8 years ago

Recalling that this requires two get()s on the connection– one for the object, and an additional for the current generation of the segment. I don't think it's worth baking into catbox if every get() is doubled, for a feature that many wont use. Is there a reasonable way to make the feature opt-in, or for the feature to "turn-on" once a segment is flushed?

hueniverse commented 8 years ago

You just need to add something to the key, like a prefix.

devinivy commented 8 years ago

Sure! But for the client to tell whether its looking at an old generation or an up-to-date generation of an object, the client needs to know the current generation of the segment.

If the cache isn't shared across apps then this isn't an issue (the generation can be tracked in-app), but if the cache is shared then (I believe) each get() requires a lookup of the generation of the segment.

hueniverse commented 8 years ago

No. The whole point here is to simply manipulate the key so that once you change the prefix, everything prior is not longer found.

devinivy commented 8 years ago

How do you build the key to ask for without knowing the current generation of the segment? The current generation of the segment is part of the key, right?

hueniverse commented 8 years ago

You need to add a config to set that. The developer names the special set. When they want to flush it, they change the name. That's pretty much it.

chriswiggins commented 8 years ago

Stumbling across this, isn't a simple solution just to set the item as stale?

I.e update the current item in the cache by setting the last-generated time to one older than the TTL, meaning the next time the cache is hit the TTL is expired, and the item is then regenerated?

hueniverse commented 8 years ago

@chriswiggins problem is you don't have a list of all the items.

siacomuzzi commented 7 years ago

Is there any update about this issue? Thanks!

hueniverse commented 7 years ago

There is not going to be an update until someone does the work and submits a PR for review.

szimek commented 7 years ago

Would it be ok to add a new flush({ partition, segment }, callback) method that would be supported only by some strategies?

benedikt-roth commented 7 years ago

@hueniverse It will be quite cumbersome to extend the catbox-redis implementation since this requires to add it to the generic catbox API. Which in turn requires to add it to all other implementations (memcached, mongodb, etc.)

How do you propose to do this?

P.S.: I like the variant of issuing a single EVAL command containing a Lua script (see here).

hueniverse commented 7 years ago

@rossgardt the idea I have is pretty simple. Today we store each item with an envelope which includes ttls, etc. We can add a new value to the envelope called generation. It defaults to undefined. If you set a new cache generation value, we set that in all new cache items. When we read from the cache, we check it's generation value in the envelope and if it doesn't match the cache global current value, we treat it as missing. This way, if you want to flush the entire cache, this will make it behave as if the cache is empty, without having to actually deal with the cache content.

kanongil commented 7 years ago

@hueniverse This sounds simple enough, indeed. However, it does require the user to implement some mechanism to sync the current generation value between instances.

Otherwise, we would need to store it in the db, and read it along with the cached value.

hueniverse commented 7 years ago

Yep. It is a limited solution.

lennerd commented 7 years ago

I also needed something like this and the only thing I could come up with at present was keep a list of all the keys in memory and iterate over them and call drop.

@arb Can you elaborate or share an example on how you implemented this key storage?

jgallen23 commented 7 years ago

I added a pr for dropSegment here #187

hueniverse commented 4 years ago

This is clearly not going to happen...