jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 86 forks source link

SingleServerIRCBot exponentially increases reconnect attempts #82

Closed Hornwitser closed 8 years ago

Hornwitser commented 8 years ago

When the SingleServerIRCBot gets disconnected by the IRC server, it improperly handles reconnecting, causing it to double it's reconnect attempts every reconnect_interval under certain conditions.

Cause The code handling disconnects is the following:

    def _on_disconnect(self, c, e):
        self.channels = IRCDict()
        self.connection.execute_delayed(self.reconnection_interval,
                                        self._connected_checker)

After a reconnection_interval the _connected_checker gets called.

    def _connected_checker(self):
        """[Internal]"""
        if not self.connection.is_connected():
            self.connection.execute_delayed(self.reconnection_interval,
                                            self._connected_checker)
            self.jump_server()

Which schedule another connection check and calls jump_server() to try to reconnect. If the connection attempt done by jump_server() is successful, but the client is disconnected by the IRC server (for example due to being banned), then _on_disconnect is dispatched, which schedules another connection check. And thus one connection check has turned into two.

jaraco commented 8 years ago

Any ideas on what the client should do instead?

Hornwitser commented 8 years ago

Properly implemented, and in line with best practices, a client re-connecting to a resource online should exponentially increase it's retry timer every time it fails ad infinitum. (An interesting read for what happens when you don't do this is the Router Flood of University of Wisconsin NTP Server.) At a small scale, just waiting a minute works fine though.

What is not fine is exponentially increasing the number of times the client tries to reconnect every minute, so that after less than 10 minutes it's doing a continuous flood of connection attempts against the IRC server.

jaraco commented 8 years ago

I agree on all points. Could you devise an implementation within this library?

jpbarto commented 8 years ago

The right thing to do would be to cancel any scheduled connected_checker executions once a new connection is established. As it is I don't see any reference return value being produced to support cancelation. The next alternative is to store a flag within the bot which marks whether a connection check is scheduled and pending. If upon the next disconnect a check is pending there is no need to schedule another. This should break the pitfall sequence which miss currently possible.

jpbarto commented 8 years ago

As for a more intelligent exponential backoff strategy, in terms of reconnect, I'd recommend doing something like the following: https://gist.github.com/strife25/9310539

jaraco commented 8 years ago

Thanks to @jpbarto for the initial implementation. I've taken it a step further and extracted the abstract concept of a ReconnectStrategy in the latest code. Since there aren't any tests for the bot, I'm not terribly confident in the implementation.

@Hornwitser, would you be willing to test the latest code in the environment where you saw the failure and verify it works as expected?

jaraco commented 8 years ago

I couldn't leave well-enough alone. I wasn't confident in the implementation not having had a way to test it, so I wrote some tests. Now I'm very much happier with the implementation, which uses an actual IRC server to test the bot and ensure that the delayed commands aren't accumulating, a test which fails on the code prior to the changes. I'll be releasing this as 14.1.

jpbarto commented 8 years ago

Brilliant, thanks and thank you for adding the tests