kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
109.72k stars 39.29k forks source link

Clustered redis example fails to re-elect and failover #4914

Closed vishvananda closed 9 years ago

vishvananda commented 9 years ago

Following the clustered redis example, everything works fine until I delete the original redis master. I see the pods getting recreated, but a new master doesn't seem to be getting elected. I can retrieve data from redis but i can't write anything. I see the following over and over in the log of the redis servers:

[19] 27 Feb 23:20:10.585 # Error condition on socket for SYNC: No route to host [19] 27 Feb 23:20:10.883 * Connecting to MASTER 10.244.37.12:6379

Which was the ip address of my original master:

redis-master 10.244.37.12 redis kubernetes/redis:v1 10.130.78.11/10.130.78.11 name=redis,redis-sentinel=true,role=master Running sentinel kubernetes/redis:v1

Not sure what I might have missed.

vishvananda commented 9 years ago

I see the following in the sentinel logs:

[16] 27 Feb 23:23:19.591 # +failover-state-select-slave master mymaster 10.244.37.12 6379 [16] 27 Feb 23:23:19.647 # -failover-abort-no-good-slave master mymaster 10.244.37.12 6379

thockin commented 9 years ago

@brendandburns has the best experience here, I think

vishvananda commented 9 years ago

I'm also a bit unclear as to how the proxy is supposed to work with this config. The proxy will pick a random redis server to send a request to and if it is a slave it doesn't seem to accept writes.

erictune commented 9 years ago

@brendandburns there was a request on IRC for the Dockerfile for kubernetes/redis-proxy. I think you probably have that file. It doesn't appear to be in github.

AntonioMeireles commented 9 years ago

@brendandburns

(for context) was me who made the request above :smile:. i'm mostly curious about where the proxy blob comes from (according to this)

mulloymorrow commented 9 years ago

HA proxy seems to be helpful here: https://robertianhawdon.me.uk/2014/02/11/sysops-installing-a-high-availability-redis-service-on-centos-6-x-in-windows-azure/

mulloymorrow commented 9 years ago

Mike working on a solution: https://github.com/mikedanese/k8s-haproxy

AntonioMeireles commented 9 years ago

@brendandburns don't forget to see above please, when you have a spare minute... :smile:

piosz commented 9 years ago

I can take w look.

piosz commented 9 years ago

@AntonioMeireles @vishvananda That seems to be no longer a problem since we are using kube-dns instead of SERVICE_HOST, SERVICE_PORT env variables (see #5284). Could you please verify if you still have such problem?

jayunit100 commented 9 years ago

this can still be a bug - ... after all (1) injecting the ENV variables are a supported feature, and continue to be so... and (2) they should allow for HA via service binding+replication controller ... so id say one of 2 things should happen....

makes sense?

bgrant0607 commented 9 years ago

Is this about: https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/redis ?

There is no master service.

It's not clear to me how that example is expected to work.

The redis and sentinel controllers create separate pods. sentinel.py is not in the image directory, so I don't know what it's doing, but I don't see how it can find the redis replicas.

piosz commented 9 years ago

Sorry, I've thought it was about guestbook example. I'll take a look into redis example tomorrow.

bgrant0607 commented 9 years ago

IIUC, the guestbook example doesn't even pretend to handle master failure.

moosilauke18 commented 9 years ago

The original problem seems to work when you remove the master pod, it does re-elect/failover. However, it becomes hard to write to redis since you can only write to the master and the master in the example is elected. Basically if you put a service in front of the redis pods, it will proxy to one of the replicas, where only one is the actual master.

bgrant0607 commented 9 years ago

It looks like the sentinels think they know which component the master is. If non-masters failed readiness probes, they'd get removed from the endpoints list for the service.

goltermann commented 9 years ago

It looks like all questions have been answered here. Please reopen if needed.

hardcoar commented 8 years ago

hi. how does this work? like @bgrant0607 said, there's no master service and @moosilauke18 said, there's a chance that a service in front will proxy a write request into a slave. is there any way to proxy writes to master and reads to slaves using a service?

erictune commented 8 years ago

@mamspatkar21 @hadcoar have you tried the "redis-cluster" helm chart?
https://github.com/helm/charts/tree/master/redis-cluster That uses a service for the redis-sentinel, but no service for the redis masters. So, there should not be any issues with the service munging the IPs or sending writes to the wrong place. It requires that you use a client which understands how to discover a master using the redis-sentinel (which redis-cli does not seem to be able to do, but some clients can (ones that follow: http://redis.io/topics/sentinel-clients) . Looks like one client that supports it is https://github.com/nrk/predis.

klausenbusk commented 7 years ago

@brendandburns there was a request on IRC for the Dockerfile for kubernetes/redis-proxy. I think you probably have that file. It doesn't appear to be in github.

Any update on this? @brendandburns

rogeruiz commented 7 years ago

@manaspatkar21 We ran into this too on our deployment of redis HA. Setting the slave-announce-ip to the value of $(hostname -i) worked for us. [We originally were going to disable ipmasq on docker and then enable it on flannel],(https://stackoverflow.com/a/37411969) but ultimately the easier thing was to properly configure redis servers with that property.