Open GoogleCodeExporter opened 8 years ago
This is actually the result of the slave expire semantics... I'm thinking about
it a bit more and will reply back here.
Eventually all the keys will expire on the slave as well, but it can take some
time for a small percentage of keys that are not accessed on the master.
Original comment by anti...@gmail.com
on 13 Apr 2011 at 1:33
I must say, it is a bit confusing that the expire command comes from the
master. Is there a reason why the slave expire semantics should not be
independent?
Original comment by mart.s...@gmail.com
on 14 Apr 2011 at 7:46
The following happens: in 2.2, the slave is no longer responsible for expiring
keys, since this may lead to race conditions. Rather, when the master expires a
key, it propagates a DEL to the slave. The KEYS command does not return this
key since it checks the time the key *should* have been expired, which is in
your case in the past. However, since the slave may not expire keys itself, it
doesn't remove the (expired) key. When you subsequently issue a GET this will
just return the key, since it is not the slave's task to expire things. I agree
that we probably should make KEYS as oblivious to expiry as the rest of the
commands and make things consistent that way.
Original comment by pcnoordh...@gmail.com
on 14 Apr 2011 at 8:05
Pieter: even with this change, imagine this scenario:
You fill the master with 10 million volatile keys. All this keys but one have a
TTL of one day.
One instead has a TTL of 5 seconds. However if no one requests this the key
from the master, the key will only expire on the slave if the active expiring
on the master will synthesize the DEL. So a key with a TTL of 5 seconds will
take hours to expire on the slave if never requested in the master... this is a
real problem unfortunately.
I'm trying to understand if there are good solutions to fix this problem. For
instance after a given delta the slave may assume that the replication link
latency should not be so high and expire the key, or something like that, but
well it's a bit dangerous... I bet there is some better solution.
Original comment by anti...@gmail.com
on 14 Apr 2011 at 8:31
Yes, this is indeed a problem when the slaves are used as read slaves and the
expiry should be triggered to force a cache update of some sorts. I think that
consistency between master and slave is the most important thing here, so maybe
we should change something in the way the master expires keys. Changing the
semantic on the slave end doesn't sound like a good idea to me... The cleanest
solution here would be to use a skiplist to store expiring keys for full
ordering, so expiry becomes deterministic. The keys are already shared with the
main HT, so the only extra cost would be the skiplist nodes. That said, I don't
like the extra memory cost ;-)
Original comment by pcnoordh...@gmail.com
on 14 Apr 2011 at 8:42
Yes I had a similar idea, but the problem is that we already have a skiplist
exactly the one you described when the cluster is on... so this would be two
skiplists... and it is probably too much. I agree that changing the semantics
on the slave is not great either.
Other possible solutions (just speculating): in the case the slave finds a key
that is both pretty old and still not expired, it could ask the master if the
key is actually there, but I already see subtle race conditions here... well
for now better to leave it as it is :)
Instead, talking about how to improve the master expire process without using
too much memory, I've a few ideas. We can just allocate a fixed pool of 10000
objects at max, where we make sure to annotate objects that are likely to
expire soon. So both when we set a very short expire, or when we due to random
sampling find a key that is still not expired but is pretty young, we set it in
our 10k items pool, containing only "near to expire" keys. This could improve
the expire precision without requiring O(N) memory. Just an idea but we may
refine it.
A much simpler alternative: put the expiration rate in redis.conf so that users
may select different tradeoffs :) Currently I think it is 100 samples per
second (but adaptive is too much expired keys are found). We can make it
configurable.
Original comment by anti...@gmail.com
on 14 Apr 2011 at 8:51
The same problem holds with a fixed pool: set 10k keys to expire in a minute, 1
in 5 minutes and 1m in a day. Then, the fixed pool will be populated by the 10k
keys expiring in a minute and after that minute you're back to the same
scenario. I think we should investigate common expiration patterns and see if a
fixed pool would fit in there, since we can think of a pathological scenario
where it doesn't work for every memory-cheap alternative to using a full blown
skip list.
Original comment by pcnoordh...@gmail.com
on 14 Apr 2011 at 9:08
Indeed, the fixed pool only works when there is standard deviation of TTLs is
high enough, so we can "accumulate" the information we find along the way while
scanning the hash table in the normal active expiration cycle. For instance if
all the keys have a similar TTL, the information we gain has entropy near to
zero and thus it is completely useful to have such a pool. One interesting
thing with the pool however is that you remove keys from the pool after they
are deleted making new space, so in a few patterns it could work. +1 with the
fact that there is to test/simulate it in real scenarios to see if/how it helps.
Original comment by anti...@gmail.com
on 14 Apr 2011 at 9:13
I guess this is not just an issue with slave. Recently I discovered that some
keys in our master database fail to expire. I do not know how to reproduce it
because it seems to be quite random.
Original comment by mart.s...@gmail.com
on 24 May 2011 at 7:50
I guess this is still open, since I found this behaviour at 2.2.8. A key
expired and deleted in master is still present at slave by GET but not by KEYS.
Is there a way to force deletion of all expired keys at slave?
Original comment by dari...@guzik.com.ar
on 30 Jan 2012 at 6:31
If this issue is still open, is it possible to solve it by giving the slave the
ability to send a suggestion to the master to check if a key needs to be
expired? The slave would only send such suggestions when it believes a key is
expired. The master retains full control, and the information available to the
slaves is not completely wasted.
Of course, if there is also a problem where expirations on the master are not
propagated to the slaves, this sounds like just a straightforward bug that
needs to be fixed.
Original comment by James.Ma...@manwin.com
on 25 Sep 2012 at 8:01
We run several high usage Redis instances across multiple clusters and this
behaviour is start to have an important impact on us.
I'd like to suggest a setting in the config file for how the slaves should
respond to expirations something like normal vs. strict. Normal would be
current behaviour and strict would allow the slave to expire the content
itself, perhaps standalone would be a better name?
At the moment James has created a patch which will allow the slave to simulate
expiring the keys, but I would prefer an official solution. I'm will to sponsor
a fix for this if that would help make it a reality.
Original comment by perry.st...@mindgeek.com
on 17 Apr 2013 at 10:09
How many databases are you using and what version of Redis are you running ?
I was not able to reproduce any problems where expirations on the master are
not propagated to the slaves correctly though I carefully checked these stuff
in replication code recently.
It's quite easy to make the slave delete keys but it's not ideal. It works if
the system-clock is correctly synchronized between the two machines but it
breaks the logic master = Active, and slave = Passive.
So on the paper it sucks.
I quickly coded and added configurable parameter slave-checks-expired-keys
which triggers the expiration cron and check-on-get().
This may probably not go in stock redis.
To use it, apply my patch with patch -p1 < patchfile.patch and add in redis
config "slave-checks-expired-keys yes"
I think the slave asking the master is not a good idea because you end-up
running all slaves GET commands running in O(2n) instead of O(n) because every
query will need to ask the slave AND master.
So actually it'd be better to ask the master directly.
I saw also that you are not happy with the CPU usage during expiration cycles:
activeExpireCycle CPU usage has been greatly improved lately (since from
2.4.14, but also more recently) so you may want to try upgrading first.
If saving CPU is more important than saving memory, you can customize the
REDIS_EXPIRELOOKUPS_TIME_PERC (smaller = save more cpu) variable in redis.h
which is actually called in redis.c:
timelimit = 1000000*REDIS_EXPIRELOOKUPS_TIME_PERC/server.hz/100;
You will save CPU in favor of lazy memory garbage-collection.
You can also change the redis clock in the configuration but this has more
global effects.
If you still have high CPU usage after this change, I would be actually
interested in getting a sample .rdb file to experiment with new expiration
algorithms (e.g. using estimateObjectIdleTime to delete old idle items first).
Original comment by serp...@gmail.com
on 18 Apr 2013 at 8:40
Attachments:
Any info bout this? The problem still persist in new release version of redis.
Original comment by de...@derevyanchuk.com
on 16 May 2013 at 8:53
Solution please!
Original comment by jpowel...@gmail.com
on 22 Sep 2014 at 9:22
This google code project is not in use anymore. Please report bugs to
https://github.com/antirez/redis.
Recent Redis versions should work just fine with expirations. If you find a bug
in a recent version, feel free to open a new ticket over at github.
Original comment by jeredi...@gmail.com
on 22 Sep 2014 at 9:36
The relevant issue on github (note that it's still open as the time of this
writing):
https://github.com/antirez/redis/issues/1768
Original comment by ng.la...@gmail.com
on 9 Dec 2014 at 10:54
Original issue reported on code.google.com by
mart.s...@gmail.com
on 13 Apr 2011 at 11:14