Open netshade opened 11 years ago
Hmm, looking at the code I don't see how this could be a problem. The basic idea is that we should always #pop as many times as #client_for is called. Is there a particular redis call that's causing this for you? Is it happening in a simple call like #get or #set or in more complicated/nested calls like in #multi?
Loops on #incr
, that's the only command. The client is initialized with :master_only => true
. Setup was three node managers, one zookeeper, three instances of redis. Test script looked like:
loop do
expected += 1
actual = $client.incr("key")
if expected != actual
abort
end
end
The comparison stuff worked as expected, but in the course of running that script while killing random servers, the client logger would report it had received the new master, but would immediately thereafter attempt to connect to the old master and write, either causing connection errors or read-only errors.
( Ruby 1.9.3 )
We've been testing failover and while, by and large, the server selection works as expected, occasionally we see failover not selecting the right master server even after a node manager has correctly broadcast the changes to our client.
Taking a look at the code, it seems like the conditional server selection here (where presence in the stack overrides master selection if :master_only is supplied):
https://github.com/ryanlecompte/redis_failover/blob/master/lib/redis_failover/client.rb#L458
coupled with
.pop
usage on the thread local client stack here:https://github.com/ryanlecompte/redis_failover/blob/master/lib/redis_failover/client.rb#L475
might be causing it. We replaced the
.pop
usage with.clear
and emptied the stack entirely when the method was called, and we stopped seeing incorrect client selection. Does this seem right to you? Is that actually a problem?