The default behaviour for ping and heartbeat to trigger node state changes can
produce false
failovers. It should be changed as follows (got some advice from a network
engineer for these
values).
The ping interval should be 1000ms, with a timeout of 3000ms, and a retry of 3
times before
triggering a disconnect (so 9+ seconds to trigger disconnect).
The heartbeat interval should also 1000ms, with a tolerance of 6 times before
triggering a
failover (so 6+ seconds to trigger failover).
It's also important for both that the interval be measured from the *end* of
the last result, to
avoid queuing up a large number of pings / heartbeats / expected heartbeats.
Related: Issue 5
Original issue reported on code.google.com by willhains on 18 Jun 2008 at 11:21
Original issue reported on code.google.com by
willhains
on 18 Jun 2008 at 11:21