Open GoogleCodeExporter opened 8 years ago
Adding to this, we were bitten again by this issue recently.
Ideally, I'd like two config file vars to help. One would set the connect
timeout between a slave and its master (letting the OS decide waits far too
long!). Secondly, being able to set the number of retries the slave will do
will help to mitigate this as well. In our environment, I want the slave to
have a 3 second connect timeout to the master and to try at most 3 times before
giving up. Then our monitoring system can catch it and get a human involved.
Otherwise, the slave is mostly unresponsive (long pauses in responses to other
commands) while it's waiting for the timeout to fire during socket connection.
Original comment by jzaw...@gmail.com
on 30 Jul 2010 at 3:36
FWIw, I'd call these master_connect_timeout and master_connect_retries (or
something similar).
Original comment by jzaw...@gmail.com
on 30 Jul 2010 at 3:40
Issue accepted, this is a very bad thing... either the reconnection should be
made async via the event loop (a bit more complex code wise but probably the
very best approach after all) or should have a sane timeout.
@jzawodn: about the max number of attempts, the problem is that a slave that
lost the connection, after the N attempts will become a pretty "strange" node
in the network. It should at least deny client connections when the master
status is no longer active... (only allowing the SLAVE and INFO command to be
issued).
Definitely something to fix, but I'm still unsure about the right thing to do...
Original comment by anti...@gmail.com
on 27 Aug 2010 at 10:52
Original issue reported on code.google.com by
jzaw...@gmail.com
on 13 Mar 2010 at 4:07