Closed PizzaBrandon closed 4 years ago
This would require every catbox strategy to implement this feature which would be somewhat hard to achieve. It will need to be implementation specific which makes it harder (e.g. the catbox layer doesn't have any idea which keys are present). I am not sure this is a practical goal at this point but if you want to try and get some of the most common catbox strategies to add such feature (e.g. some kind of criteria-based drop()
), give it a try.
I also needed something like this and the only thing I could come up with at present was keep a list of all the keys in memory and iterate over them and call drop
.
We're using Redis for our cache, so I'm going to end up directly interfacing with its flush through other means for now because of time limitations. I will look into how this could be done through catbox and the related plugins.
I found some possible solutions for range deletes in Redis using its EVAL command, but I haven't looked into the capabilities of other tools that catbox can interact with.
There is a potential simpler middle ground here. I can add a generation count to the object we store in the cache. If you want to flush something, it just increments the generation for a given partition or segment. Then when you do a get()
it checks it and if the generation is wrong, it returns not-found (and drops the value from the cache). This doesn't flush the actual cache storage but will get the desired result.
Is this something that is on the radar or has there not been enough interest in this feature?
I would be interested in this feature using Eran's suggested strategy.
I'll take a pull request as long as it's non-breaking.
Recalling that this requires two get()
s on the connection– one for the object, and an additional for the current generation of the segment. I don't think it's worth baking into catbox if every get()
is doubled, for a feature that many wont use. Is there a reasonable way to make the feature opt-in, or for the feature to "turn-on" once a segment is flushed?
You just need to add something to the key, like a prefix.
Sure! But for the client to tell whether its looking at an old generation or an up-to-date generation of an object, the client needs to know the current generation of the segment.
If the cache isn't shared across apps then this isn't an issue (the generation can be tracked in-app), but if the cache is shared then (I believe) each get()
requires a lookup of the generation of the segment.
No. The whole point here is to simply manipulate the key so that once you change the prefix, everything prior is not longer found.
How do you build the key to ask for without knowing the current generation of the segment? The current generation of the segment is part of the key, right?
You need to add a config to set that. The developer names the special set. When they want to flush it, they change the name. That's pretty much it.
Stumbling across this, isn't a simple solution just to set the item as stale?
I.e update the current item in the cache by setting the last-generated time to one older than the TTL, meaning the next time the cache is hit the TTL is expired, and the item is then regenerated?
@chriswiggins problem is you don't have a list of all the items.
Is there any update about this issue? Thanks!
There is not going to be an update until someone does the work and submits a PR for review.
Would it be ok to add a new flush({ partition, segment }, callback)
method that would be supported only by some strategies?
@hueniverse It will be quite cumbersome to extend the catbox-redis implementation since this requires to add it to the generic catbox API. Which in turn requires to add it to all other implementations (memcached, mongodb, etc.)
How do you propose to do this?
P.S.: I like the variant of issuing a single EVAL command containing a Lua script (see here).
@rossgardt the idea I have is pretty simple. Today we store each item with an envelope which includes ttls, etc. We can add a new value to the envelope called generation
. It defaults to undefined
. If you set a new cache generation value, we set that in all new cache items. When we read from the cache, we check it's generation value in the envelope and if it doesn't match the cache global current value, we treat it as missing. This way, if you want to flush the entire cache, this will make it behave as if the cache is empty, without having to actually deal with the cache content.
@hueniverse This sounds simple enough, indeed. However, it does require the user to implement some mechanism to sync the current generation value between instances.
Otherwise, we would need to store it in the db, and read it along with the cached value.
Yep. It is a limited solution.
I also needed something like this and the only thing I could come up with at present was keep a list of all the keys in memory and iterate over them and call drop.
@arb Can you elaborate or share an example on how you implemented this key storage?
I added a pr for dropSegment here #187
This is clearly not going to happen...
In an upcoming project launch, we're planning on using a relatively long cache TTL to improve average response times on our API, but I need to invalidate (all or part) the cache on demand.
For example:
Ideally, it'd be great if I could delete all keys in a partition, a segment, or a any key starting with a given substring so I could limit the "damage" to as small of a keyspace as necessary.