miguelgrinberg / python-socketio

Python Socket.IO server and client
MIT License
3.95k stars 585 forks source link

AsyncClient crashes after reconnect to server with short reconnection_delay #1004

Open bialix opened 2 years ago

bialix commented 2 years ago

I have consistent crashes with AsyncClient (v.5.7.1 and checked previous 5.6.0, 5.50 the same) with following slightly modified example from your documentation:

import asyncio
import socketio

sio = socketio.AsyncClient(reconnection_delay=0.1, handle_sigint=False, logger=True, engineio_logger=True)

@sio.event
async def connect():
    print('connection established')

@sio.event
async def disconnect():
    print('disconnected from server')

async def main():
    await sio.connect('http://127.0.0.1:5000')
    await sio.wait()

if __name__ == '__main__':
    asyncio.run(main())

My socketio server is also written in Python and running on Ubuntu 22.04 machine as a daemon. So every time I execute:

sudo systemctl restart my-socketio-server.service

That client crashes. Increasing reconnection_delay to 3-5-10 seconds does help. Sometimes even 1s is enough. The crash occurs after successful reconnect, that's weird.

Here is example of output when crashed:

On Windows:

Attempting polling connection to http://127.0.0.1:5000/socket.io/?transport=polling&EIO=4
Polling connection accepted with {'sid': 'IvMLd7wvWhvUp8p-AAAG', 'upgrades': ['websocket'], 'pingTimeout': 20000, 'pingInterval': 25000}
Engine.IO connection established
Sending packet MESSAGE data 0{}
Attempting WebSocket upgrade to ws://127.0.0.1:5000/socket.io/?transport=websocket&EIO=4
WebSocket upgrade was successful
Received packet NOOP data
Received packet MESSAGE data 0{"sid":"zTurflKPNrlpwslHAAAH"}
Namespace / is connected
connection established

Waiting for write loop task to end
Exiting write loop task
Engine.IO connection dropped
disconnected from server
Exiting read loop task
Connection failed, new attempt in 0.37 seconds
Attempting polling connection to http://127.0.0.1:5000/socket.io/?transport=polling&EIO=4
Polling connection accepted with {'sid': 'deH2fm-2n9AQjJpiAAAA', 'upgrades': ['websocket'], 'pingTimeout': 20000, 'pingInterval': 25000}
Engine.IO connection established
Sending packet MESSAGE data 0{}
Attempting WebSocket upgrade to ws://127.0.0.1:5000/socket.io/?transport=websocket&EIO=4
WebSocket upgrade was successful
Received packet NOOP data
Received packet MESSAGE data 0{"sid":"xnhGO0se25HYAJztAAAB"}
Namespace / is connected
connection established
Reconnection successful
packet queue is empty, aborting
Exiting write loop task
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000001C75DE2F50>
Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x0000001C7532C430>
Traceback (most recent call last):
  File "C:\Python\3.10-64\lib\asyncio\proactor_events.py", line 116, in __del__
  File "C:\Python\3.10-64\lib\asyncio\proactor_events.py", line 108, in close
  File "C:\Python\3.10-64\lib\asyncio\base_events.py", line 750, in call_soon
  File "C:\Python\3.10-64\lib\asyncio\base_events.py", line 515, in _check_closed
RuntimeError: Event loop is closed
bialix commented 2 years ago

Here is output on Ubuntu 22.04, more details:

...
connection established
Waiting for write loop task to end
Exiting write loop task
Engine.IO connection dropped
disconnected from server
Exiting read loop task
Connection failed, new attempt in -0.20 seconds
Attempting polling connection to http://127.0.0.1:10069/socket.io/?transport=polling&EIO=4
HTTP GET request to http://127.0.0.1:10069/socket.io/?transport=polling&EIO=4&t=1660645375.8138015 failed with error Cannot connect to host 127.0.0.1:10069 ssl:default [Connect call failed ('127.0.0.1', 10069)].
Connection failed, new attempt in -0.06 seconds
Attempting polling connection to http://127.0.0.1:10069/socket.io/?transport=polling&EIO=4
HTTP GET request to http://127.0.0.1:10069/socket.io/?transport=polling&EIO=4&t=1660645375.814392 failed with error Cannot connect to host 127.0.0.1:10069 ssl:default [Connect call failed ('127.0.0.1', 10069)].
Connection failed, new attempt in 0.40 seconds
Attempting polling connection to http://127.0.0.1:10069/socket.io/?transport=polling&EIO=4
Polling connection accepted with {'sid': 'ItWgoXXieTeyZ2MJAAAA', 'upgrades': ['websocket'], 'pingTimeout': 20000, 'pingInterval': 25000}
Engine.IO connection established
Sending packet MESSAGE data 0{}
Attempting WebSocket upgrade to ws://127.0.0.1:10069/socket.io/?transport=websocket&EIO=4
WebSocket upgrade was successful
Received packet NOOP data
Received packet MESSAGE data 0{"sid":"qg8E6wZGoUt6XtTGAAAB"}
Namespace / is connected
connection established
Reconnection successful
packet queue is empty, aborting
Exiting write loop task
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f562551f790>
Exception ignored in: <function ClientResponse.__del__ at 0x7f562591b010>
Traceback (most recent call last):
  File "/opt/socketio-client/venv/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 810, in __del__
  File "/opt/socketio-client/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 178, in release
  File "/opt/socketio-client/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 665, in _release
  File "/opt/socketio-client/venv/lib/python3.10/site-packages/aiohttp/client_proto.py", line 63, in close
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 697, in close
  File "/usr/lib/python3.10/asyncio/base_events.py", line 750, in call_soon
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
RuntimeError: Event loop is closed
bialix commented 2 years ago

I suppose you might say the bug actually in aiohttp, but increasing reconnection delay does fix this.

miguelgrinberg commented 2 years ago

Yes, this is my bug. There are some cleanup tasks that occur after the reconnection loop begins. The solution is that I need to wait to begin the reconnection until those background tasks complete. If a new connection is attempted before these tasks are done, then you get strange results. I don't consider this a high priority issue because there is really no point in having reconnections that quickly (i.e. it would never make sense to do this in a production deployment as your server would be flooded with reconnection requests all at the same time and might die again due to the load spike). But it is a bug, so I will look into it.

bialix commented 2 years ago

I'm OK with longer timeout. Maybe it could have been helped me to understand why my client crashes if it was mentioned in the docs, with some valid data as example. Just a short workaround for anybody else.