HA - Automatic Failover

eveiga commented 11 years ago

Hi! First of all, thanks for the proxy, it has been really helpful :)

I'm in need of a decent solution for automatic failover and already stated that twemproxy doesn't support it. Any thoughts or ideas on it?

I was thinking on a external process that would leverage the use o redis-sentinel and on a master-switch event updates the IP address on nutcracker.conf and restarts the service.

manjuraj commented 11 years ago

Glad you liked it @eveiga

I believe using the external process the way you described makes sense. In fact you can have two twemproxy processes running - one routing the traffic to all the masters and the other to all the slaves. On a failover event, you switch from one twemproxy to the other

bmatheny commented 11 years ago

@manjuraj @eveiga that's what we do for memcache when there are events like a total failure (external process). Works quite well.

eveiga commented 11 years ago

@manjuraj I've already thought on that solution. Can I use the slaves cluster to perform read operations? Or the hashes wont pair with the ones for the master cluster?

@bmatheny are you using my sugestion or manjuraj one?

bmatheny commented 11 years ago

@eveiga the one you recommended. When the topology needs to change the config is updated by an external process and twem gets restarted.

eveiga commented 11 years ago

@bmatheny sorry for the boring questions :) dont you experience a window of downtime during that restart? If yes, How do you cope with that?

BTW, are you using any pool of twemproxy just with slaves for reading?

eveiga commented 11 years ago

Humm, I forgot you are using it with memcache, dont know if the last question fits your use case!

bmatheny commented 11 years ago

We do see a short burst of errors. The error type is detected by the app and retried, so we generally don't 'lose' writes, and reads will fall back to the DB.

eveiga commented 11 years ago

@bmatheny Thanks for the tips, I'll go on with that solution!

matschaffer commented 11 years ago

@eveiga thanks for the redis-sentinel reminder. So far it looks like this will work well.

Has anyone built the bits to update twemproxy when redis-sentinel finishes a failover?

@manjuraj would you recommend anything more graceful than simply rewriting the twemproxy config and restarting it?

eveiga commented 11 years ago

@matschaffer Yes, I've developed a simple service that attaches a handler to the "master-switch" event emitted by redis-sentinel, updates twemproxy.conf with the new info and restartes the service. So far so good with the tests, I'll put it in production in a short time.

matschaffer commented 11 years ago

@eveiga any chance of sharing what you've come up with?

eveiga commented 11 years ago

No problem. It's on node.js and a bit tight with our structure, still want it?

matschaffer commented 11 years ago

Sure! Even just a gist is great. Always nicer to have some collaboration. :)

On Feb 26, 2013, at 9:47, eveiga notifications@github.com wrote:

No problem. It's on node.js and a bit tight with our structure but, still want it?

— Reply to this email directly or view it on GitHubhttps://github.com/twitter/twemproxy/issues/67#issuecomment-14118050 .

eveiga commented 11 years ago

https://gist.github.com/eveiga/5039007

As I said, it's pretty tight with our structure (init scripts path, mails, etc) and could be a lot configurable, but it can give you a starting point.

Sugestions are welcome!

manjuraj commented 11 years ago

it you guys can make this generic enough, we can check this into the scripts/ folder of twemproxy

matschaffer commented 11 years ago

@eveiga how's yours panning out? Over here it seems to work if I'm careful about the startup order. But if the agent comes up before the sentinel the agent seems to deadlock after a certain number of retries. Have you run into that or are you controlling start order more carefully.

matschaffer commented 11 years ago

@eveiga btw, I have this up at https://github.com/matschaffer/redis_twemproxy_agent as something I can pack with npm and get some rough testing around. I took out the email notifier though since we'll probably want to notify via other means.

eveiga commented 11 years ago

Hey @matschaffer, I've assumed that the sentinel was already running, but indeed we should have some kind of reaction on a failed startup. Thanks for packing this in a new repo, I'll take a look at it during the weekend and try to do some contribution!

matschaffer commented 11 years ago

No problem! After further testing I'm not sure that's the case (with the startup order issue). Not sure what caused the lack of reconfiguration on my first test but I haven't been able to replicate it. My latest commit logs a lot to stdout in hopes that I can tell what's up if it happens again.

matschaffer commented 11 years ago

@eveiga how's this working for you? For me it was working great until I added a second sentinel. Seems like a single sentinel may or may not broadcast the failover messages. Still investigating though.

matschaffer commented 11 years ago

After some investigation it looks like it's not just the multiple sentinels but rather multiple masters failing at the same time. The agent doesn't seem to reliably get all the switch-master messages :(

matschaffer commented 11 years ago

Swapping for node-sentinel for direct use of node-redis seems to help. Gonna do another test now.

eveiga commented 11 years ago

Hey @matschaffer! Sorry for the absence, I'm back on this! Thanks for the bumps on it, I'll take a look and update the production code.

eveiga commented 11 years ago

BTW: I never had more than one sentinel so I've never crashed into your problem.

idning commented 10 years ago

hi, all, try https://github.com/idning/redis-mgr please

nidhhoggr commented 7 years ago

If anyone is interested I started a C implementation of https://github.com/matschaffer/redis_twemproxy_agent at https://github.com/nidhhoggr/twemproxy_sentinel

virendarkmr commented 6 years ago

Hi, I am stuck with same issue. I have 2 different redis cluster with master slave slave and sentinel is handling failover. I redis twemproxy agent is working fine with when I give single sentinl ip in cli.js How can I handle failover for two cluster?

douglaslps commented 6 years ago

hi, all, try https://github.com/idning/redis-mgr please

What happened with that? I'm getting page not found.

twitter / twemproxy

HA - Automatic Failover #67