Open scalp42 opened 7 years ago
@scalp42 Thanks a lot for testing this and for discovering those issues! I think I'll be able to take a look at this tomorrow, as today I'll be very busy. I saw your pull request too, the change makes perfect sense of course :-) I'll apply that change locally and check the rest of the issues tomorrow morning. Again, thanks a lot for your help!
No worries, very grateful for all your work as always š
UPDATE: I also tested with a proper Redis setup (meaning the Redis instances are reachable directly) and I noticed it takes a long time to failover on client side.
Running a script which constantly pushes updates to Redis using Redisent, with 3 Redis nodes on ports 6380 to 6382 (Sentinel nodes on ports 26391/92/93):
$> bundle exec ./ami.rb --threads 10 -p all --redis --sentinel_hosts 192.168.99.100:26391 --sentinel -l debug
[2017-03-12 13:27:29 -0700 INFO] Starting run with 10 threads.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_ap_northeast_1 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_ap_northeast_2 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_ap_south_1 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_ap_southeast_1 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_ap_southeast_2 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_ca_central_1 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_eu_west_1 => Started.
[2017-03-12 13:27:29 -0700 INFO] ami_ubuntu_eu_central_1 => Started.
# Kicking a sleep on the master node port 6380
# redis-cli -h 192.168.99.100 -p 6380 DEBUG sleep 60
# You can see it hanging for 53 secs on the client side, yet Sentinel already reports the new master node on port 6382
# > redis-cli -h $DOCKER_IP -p 26391 SENTINEL get-master-addr-by-name mymaster
# 1) "192.168.99.100"
# 2) "6382"
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_ap_southeast1 => Pushed ami_ubuntu_ap_southeast1::12.04,14.04,16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_ap_southeast2 => Pushed ami_ubuntu_ap_southeast2::12.04,14.04,16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_ap_northeast1 => Pushed ami_ubuntu_ap_northeast1::12.04,14.04,16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_eu_central1 => Pushed ami_ubuntu_eu_central1::12.04,14.04,16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_eu_west1 => Pushed ami_ubuntu_eu_west1::12.04,14.04,16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_ap_northeast2 => Pushed ami_ubuntu_ap_northeast2::16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_ap_southeast_1 => Done in 53.83 seconds.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_ap_northeast_1 => Done in 53.83 seconds.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_ap_southeast_2 => Done in 53.83 seconds.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_eu_west_1 => Done in 53.83 seconds.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_eu_central_1 => Done in 53.83 seconds.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_ap_northeast_2 => Done in 53.84 seconds.
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_ap_south1 => Pushed ami_ubuntu_ap_south1::16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_ca_central1 => Pushed ami_ubuntu_ca_central1::16.04 to redis://192.168.99.100:6380
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_ca_central_1 => Done in 53.83 seconds.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_ap_south_1 => Done in 53.84 seconds.
# then works fine when sleep finishes and points now to new master node on 6382
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_eu_west_2 => Started.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_sa_east_1 => Started.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_us_east_2 => Started.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_us_east_1 => Started.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_us_west_2 => Started.
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_us_west_1 => Started.
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_us_east2 => Pushed ami_ubuntu_us_east2::16.04 to redis://192.168.99.100:6382
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_us_east_2 => Done in 0.01 seconds.
[2017-03-12 13:28:23 -0700 DEBUG] ami_ubuntu_eu_west2 => Pushed ami_ubuntu_eu_west2::16.04 to redis://192.168.99.100:6382
[2017-03-12 13:28:23 -0700 INFO] ami_ubuntu_eu_west_2 => Done in 0.01 seconds.
When I look on the Sentinel side, the failover happens fast but somehow Redisent doesn't necessarily pick it up immediately. I was thinking that as soon as a new master is elected, Redisent switches to the new ones.
Just thoughts right now, probably missing something or I could be totally wrong š¶
Another test:
[2017-03-12 13:47:14 -0700 INFO] Starting run sequential.
[2017-03-12 13:47:14 -0700 INFO] ami_ubuntu_ap_northeast_1 => Started.
# Kick a sleep on master node 6382
# redis-cli -h 192.168.99.100 -p 6382 DEBUG sleep 30
# Sentinel starts failover couple secs after and already have master node on 6381 at this point but connection is not timing out
[2017-03-12 13:47:42 -0700 DEBUG] ami_ubuntu_ap_northeast1 => Pushed ami_ubuntu_ap_northeast1::12.04,14.04,16.04 to redis://192.168.99.100:6382
[2017-03-12 13:47:42 -0700 INFO] ami_ubuntu_ap_northeast_1 => Done in 28.38 seconds.
# Sleep over after 30 secs, not hanging anymore and pushing to right master node on 6381
[2017-03-12 13:47:42 -0700 INFO] ami_ubuntu_ap_northeast_2 => Started.
[2017-03-12 13:47:42 -0700 DEBUG] ami_ubuntu_ap_northeast2 => Pushed ami_ubuntu_ap_northeast2::16.04 to redis://192.168.99.100:6381
[2017-03-12 13:47:42 -0700 INFO] ami_ubuntu_ap_northeast_2 => Done in 0.01 seconds.
I'm hoping it makes sense but in that case the connection can hang forever if I sleep 2000 secs for example, it never times out even though Sentinel already elected a new master after 5 secs.
I also tested using docker pause directly:
[2017-03-12 13:51:39 -0700 INFO] Starting run sequential.
[2017-03-12 13:51:39 -0700 INFO] ami_ubuntu_ap_northeast_1 => Started.
# Hangs forever until I docker unpause the old master
# Sentinel has picked a new master as expected:
# redis-cli -h $DOCKER_IP -p 26391 SENTINEL get-master-addr-by-name mymaster
# 1) "192.168.99.100"
# 2) "6382"
# Now I docker unpause the old master:
[2017-03-12 13:53:57 -0700 DEBUG] ami_ubuntu_ap_northeast1 => Pushed ami_ubuntu_ap_northeast1::12.04,14.04,16.04 to redis://192.168.99.100:6381
[2017-03-12 13:53:57 -0700 INFO] ami_ubuntu_ap_northeast_1 => Done in 137.93 seconds.
# Not timing out on initial connection as you can see the 137 secs
[2017-03-12 13:53:57 -0700 INFO] ami_ubuntu_ap_northeast_2 => Started.
[2017-03-12 13:53:57 -0700 DEBUG] ami_ubuntu_ap_northeast2 => Pushed ami_ubuntu_ap_northeast2::16.04 to redis://192.168.99.100:6382
[2017-03-12 13:53:57 -0700 INFO] ami_ubuntu_ap_northeast_2 => Done in 0.01 seconds.
[2017-03-12 13:53:57 -0700 INFO] ami_ubuntu_ap_south_1 => Started.
[2017-03-12 13:53:57 -0700 DEBUG] ami_ubuntu_ap_south1 => Pushed ami_ubuntu_ap_south1::16.04 to redis://192.168.99.100:6382
[2017-03-12 13:53:57 -0700 INFO] ami_ubuntu_ap_south_1 => Done in 0.0 seconds.
The code leverages Ost with Redisent (I could be doing this wrong):
require 'ost'
require 'redisent'
begin
sentinels = cli.config[:sentinel_hosts]
redis = Redisent.new(hosts: sentinels, name: 'mymaster')
Ost.redis = redis
queue = cli.config[:redis_key]
payload = { 'type' => 'ami_update' }
Ost[queue] << JSON.dump(payload)
logger.debug %|#{plugin} => Pushed #{plugin}::#{updated.join(',')} to #{Ost.redis.url}|
rescue Errno::ECONNREFUSED, Errno::EINVAL
logger.error %|#{plugin} => Could not connect to #{cli.config[:redis_url]}|
rescue Redisent::UnreachableHosts => ex
logger.error %|#{plugin} => Cannot talk to Sentinel hosts, #{ex.message}|
rescue Redisent::UnknownMaster => ex
logger.error %|#{plugin} => Cannot talk to Sentinel master, #{ex.message}|
end
I added some debugging and can confirm when using Ost it times out:
logger.debug %|#{plugin} => About to put in Ost queue|
Ost[queue] << JSON.dump(payload)
logger.debug %|#{plugin} => Put in Ost queue|
# docker pause rediscluster_master_1
# Run client
[2017-03-12 14:02:52 -0700 INFO] Starting run sequential.
[2017-03-12 14:02:52 -0700 INFO] ami_ubuntu_ap_northeast_1 => Started.
[2017-03-12 14:02:52 -0700 DEBUG] ami_ubuntu_ap_northeast1 => About to put in Ost queue
# Sentinel failed over to new master but code is hanging when trying to put in Ost queue
# docker unpause rediscluster_master_1
[2017-03-12 14:05:07 -0700 DEBUG] ami_ubuntu_ap_northeast1 => Put in Ost queue
[2017-03-12 14:05:07 -0700 DEBUG] ami_ubuntu_ap_northeast1 => Pushed ami_ubuntu_ap_northeast1::12.04,14.04,16.04 to redis://192.168.99.100:6382
[2017-03-12 14:05:07 -0700 INFO] ami_ubuntu_ap_northeast_1 => Done in 134.89 seconds.
[2017-03-12 14:05:07 -0700 INFO] ami_ubuntu_ap_northeast_2 => Started.
[2017-03-12 14:05:07 -0700 DEBUG] ami_ubuntu_ap_northeast2 => About to put in Ost queue
[2017-03-12 14:05:07 -0700 DEBUG] ami_ubuntu_ap_northeast2 => Put in Ost queue
[2017-03-12 14:05:07 -0700 DEBUG] ami_ubuntu_ap_northeast2 => Pushed ami_ubuntu_ap_northeast2::16.04 to redis://192.168.99.100:6381
[2017-03-12 14:05:07 -0700 INFO] ami_ubuntu_ap_northeast_2 => Done in 0.0 seconds.
Hi @soveran,
I finally got around to switching to a proper Sentinel setup. While testing, I'm running into a couple issues.
Current setup is a Docker VM through Virtualbox (
192.168.99.100
) with 3 Redis instances and 3 Sentinel instances.The Sentinel instances are configured to advertise their IP as the VM host (not Docker
172.x
), meaning that the Redis instances are not reachable from localhost (just the Sentinel ones).The idea was to test on the client side exception and what not:
I also created a "classic" Redic connection to match the
redis.prime
object:Now testing Redic timeouts, working as intended:
But if I try through Redisent, it can sometimes take forever to timeout or just not work while trying to talk to Sentinel:
Nothing show up in Sentinel logs in case:
Granted the Redis instances should be reachable but I'm not sure why this is happening when using Redisent.
Any help would be appreciated, thanks in advance š