shizmob / pydle

An IRCv3-compliant Python 3 IRC library.
BSD 3-Clause "New" or "Revised" License
154 stars 48 forks source link

Auto recovery after ConnectionResetError #155

Open felixonmars opened 3 years ago

felixonmars commented 3 years ago
2020-12-18 12:27:49,204 - IRCClient:chat.freenode - WARNING - >> Receive timeout reached, sending ping to check connection state...
2020-12-18 12:29:47,097 - IRCClient:chat.freenode - ERROR - Failed to execute on_raw_ping handler.
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/pydle/client.py", line 422, in on_raw
    await handler(message)
  File "/usr/lib/python3.8/site-packages/pydle/features/rfc1459/client.py", line 703, in on_raw_ping
    await self.rawmsg('PONG', *message.params)
  File "/usr/lib/python3.8/site-packages/pydle/client.py", line 312, in rawmsg
    await self._send(message)
  File "/usr/lib/python3.8/site-packages/pydle/client.py", line 361, in _send
    await self.connection.send(input)
  File "/usr/lib/python3.8/site-packages/pydle/connection.py", line 112, in send
    await self.writer.drain()
  File "/usr/lib/python3.8/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.8/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

Currently the connection is lost after a connection reset error. Is there any way I can catch this and force a reconnect instead of leaving the daemon broken?

theunkn0wn1 commented 3 years ago

Thats odd, Pydle should attempt to reconnect on a broken connection. I will need to do some investigating.

felixonmars commented 3 years ago

Yet another case in Python 3.9, with slightly different error message:

2021-01-27 21:50:11,450 - IRCClient:chat.freenode - WARNING - >> Receive timeout reached, sending ping to check connection state...
2021-01-27 21:50:41,453 - pydle.client - ERROR - Unexpected disconnect. Attempting to reconnect within 5 seconds.
2021-01-27 21:50:46,802 - asyncio - ERROR - Task exception was never retrieved
future: <Task finished name='Task-8' coro=<BasicClient.handle_forever() done, defined at /usr/lib/python3.9/site-packages/pydle/client.py:363> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/pydle/client.py", line 380, in handle_forever
    await self.disconnect(expected=False)
  File "/usr/lib/python3.9/site-packages/pydle/client.py", line 136, in disconnect
    await self._disconnect(expected)
  File "/usr/lib/python3.9/site-packages/pydle/client.py", line 146, in _disconnect
    await self.on_disconnect(expected)
  File "/usr/lib/python3.9/site-packages/pydle/client.py", line 338, in on_disconnect
    await self.connect(reconnect=True)
  File "/usr/lib/python3.9/site-packages/pydle/features/tls.py", line 35, in connect
    return await super().connect(hostname, port, tls=tls, **kwargs)
  File "/usr/lib/python3.9/site-packages/pydle/features/rfc1459/client.py", line 190, in connect
    await super().connect(hostname, port, **kwargs)
  File "/usr/lib/python3.9/site-packages/pydle/client.py", line 124, in connect
    await self._connect(hostname=hostname, port=port, reconnect=reconnect, **kwargs)
  File "/usr/lib/python3.9/site-packages/pydle/features/tls.py", line 54, in _connect
    await self.connection.connect()
  File "/usr/lib/python3.9/site-packages/pydle/connection.py", line 48, in connect
    (self.reader, self.writer) = await asyncio.open_connection(
  File "/usr/lib/python3.9/asyncio/streams.py", line 52, in open_connection
    transport, _ = await loop.create_connection(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1081, in create_connection
    transport, protocol = await self._create_connection_transport(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1111, in _create_connection_transport
    await waiter
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer
Rixxan commented 2 years ago

+1 to replication here, if I have the bot running on a server and then kill the server (ex, shutdown or kill the IRCD), it breaks the whole works.

ConnectionResetError happens and the bot never times out nor attempts recovery, even 10+ minutes after the last ping.

On Ubuntu: image

On Windows:


    return await asyncio.wait_for(self.reader.readline(), timeout=timeout)
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\tasks.py", line 494, in wait_for
    return fut.result()
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\streams.py", line 540, in readline
    line = await self.readuntil(sep)
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\streams.py", line 632, in readuntil
    await self._wait_for_data('readuntil')
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\streams.py", line 517, in _wait_for_data
    await self._waiter
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\proactor_events.py", line 280, in _loop_reading
    data = fut.result()
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\windows_events.py", line 812, in _poll
    value = callback(transferred, key, ov)
  File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\asyncio\windows_events.py", line 461, in finish_recv
    raise ConnectionResetError(*exc.args)
ConnectionResetError: [WinError 64] The specified network name is no longer available```
theunkn0wn1 commented 1 year ago

The reconnection logic is all sorts of buggy and will likely need to be redone entirely. I started work off on a branch to attempt to fix the issues without a rewrite, although its not perfect yet. (it can only do one reconnect attempt before exploding right now).

theunkn0wn1 commented 1 year ago

@Rixxan , in reviewing your code I believe the zombification you mention is a byproduct of your implementation. In current mainline you have two different threads running an event loop. When pydle dies on one, the other lives on and prevents a shutdown.