Closed Hornwitser closed 8 years ago
Any ideas on what the client should do instead?
Properly implemented, and in line with best practices, a client re-connecting to a resource online should exponentially increase it's retry timer every time it fails ad infinitum. (An interesting read for what happens when you don't do this is the Router Flood of University of Wisconsin NTP Server.) At a small scale, just waiting a minute works fine though.
What is not fine is exponentially increasing the number of times the client tries to reconnect every minute, so that after less than 10 minutes it's doing a continuous flood of connection attempts against the IRC server.
I agree on all points. Could you devise an implementation within this library?
The right thing to do would be to cancel any scheduled connected_checker executions once a new connection is established. As it is I don't see any reference return value being produced to support cancelation. The next alternative is to store a flag within the bot which marks whether a connection check is scheduled and pending. If upon the next disconnect a check is pending there is no need to schedule another. This should break the pitfall sequence which miss currently possible.
As for a more intelligent exponential backoff strategy, in terms of reconnect, I'd recommend doing something like the following: https://gist.github.com/strife25/9310539
Thanks to @jpbarto for the initial implementation. I've taken it a step further and extracted the abstract concept of a ReconnectStrategy in the latest code. Since there aren't any tests for the bot, I'm not terribly confident in the implementation.
@Hornwitser, would you be willing to test the latest code in the environment where you saw the failure and verify it works as expected?
I couldn't leave well-enough alone. I wasn't confident in the implementation not having had a way to test it, so I wrote some tests. Now I'm very much happier with the implementation, which uses an actual IRC server to test the bot and ensure that the delayed commands aren't accumulating, a test which fails on the code prior to the changes. I'll be releasing this as 14.1.
Brilliant, thanks and thank you for adding the tests
When the
SingleServerIRCBot
gets disconnected by the IRC server, it improperly handles reconnecting, causing it to double it's reconnect attempts everyreconnect_interval
under certain conditions.Cause The code handling disconnects is the following:
After a
reconnection_interval
the_connected_checker
gets called.Which schedule another connection check and calls
jump_server()
to try to reconnect. If the connection attempt done byjump_server()
is successful, but the client is disconnected by the IRC server (for example due to being banned), then_on_disconnect
is dispatched, which schedules another connection check. And thus one connection check has turned into two.