Closed ascandella closed 10 years ago
The code looks good, and it'll certainly work, but a heuristic approach may actually be better, here.
Why? Suppose you set the reconnect limit to 10, and then there are legitimate disconnects and reconnects for the server over the course of the next week. If it goes over 10 for the client, that code is now permanently disconnected from the server.
But the heuristic could be based on when a connection is considered "stable". Say a timeout of an hour resets the reconnect level back to zero. I'd make that a second property (that could also be Infinity
by default) so we can make sure a long-running client process never gets into a situation where we have to restart it for no "real" reason.
@dfellis I can implement a heuristic approach if you'd like. It's your project, afterall :)
What do you think are some reasonable defaults? 10 reconnects, reset after a minute? an hour?
10 and an hour are good.
I wonder if it should just reset the reconnects to zero regardless if it got a stable connection or not? Then every hour it'd retry to connect to a down service if it's truly down for maintenance for a while, and would eventually get a reconnect without intervention.
We've normally seen direct TCP connections (without HAProxy) last for about a month or two of continuous use without failure, but I'd rather not have a monthly setInterval
. ;)
Also, if someone does set either value to Infinity
, you can special case that to omit starting a setInterval
at all.
@dfellis @squamos I modified this PR to add two new parameters (updated in PR description, commit message and docs).
Care to take another look?
Looks great! Merging.
We ran into a bug in production today where haproxy had a port bound with no backends, so it was immediately returning a null response. This caused the client TcpTransport to get stuck in an infinite loop, since it resets its
retry
count back to 0 after establishing a successful connection.Previously, if the connect was successfully opened, we considered that as having a valid backend. However, for reasons mentioned above, it can be useful to limit the number of times the transport will reconnect to a server in a given time period. This commit adds two new parameters to control this behavior.
cc @dfellis @squamos