Open s3cc0 opened 1 year ago
Thanks for the detailed write-up :+1:
I'm not sure how this could happen though. The multi-node setup with Redis does not seems related, as the adapter is not called during the connection.
How do you detect this kind of issues? From the client side?
Hello, thanks for the quick feedback and the Related Bug, I didn't saw it in my research, sorry!
We found the bug when scaling containers and minimizing them again. In addition to that, it is noticeable by the forced removal of the TCP connection after 15-60min on the Google Cloud Run. After that, the reconnect from scoket.io client takes effect and a new websocket connection is established. However, this connection is established, but the event "connection" is not called, so no further listener like "disconnect", "message" ... are created at the socket. Because of this, the socket can't join rooms in the namespace. This became visible as the ws(s) connection was successful created, but no messages arrived (only the ping/pong heart-beat), using the socket.io admin UI we could confirmed this. there was the socket, but without rooms. I was able to add a room using the admin tool, then the socket get some messages. So the socket connection was there correctly, just the event "connection" from server was missing.
Related to the other bug, i can be something with the wilde-card namespace.
Conjecture: It could be due to the speed of the reconnect? We have also "throttled" this, without any success.
How did we get that to work? Here is a simple example:
For all, with the same issue, here is a current workaround:
Describe the bug We use socket.io with Google Cloud Run with with redis-adapter for exchanging data across node-cluster and multiple containers. The challenge is that Google Cloud Run kills the connections every 60min, we don't use sticky extension as we work on websockets only ourselves. Thus do not support polling. When scaling the containers no matter in which direction, it happens from time to time that the client gets a socket connection, but from server the event "connection" is not called. As a result, all other events on the socket itself do not work.
Important: We use Namespaces and Channels
To Reproduce
Version for frontend and backend: "@socket.io/admin-ui": "^0.5.1", "@socket.io/redis-adapter": "^8.1.0", "redis": "^4.6.5", "socket.io": "^4.6.1", "socket.io-client": "^4.6.1",
Redis Server 6.x+ (issue also with redis server 4.x+)
Server Code Example
Client
Expected behavior It is expected that the event "connection" is always called on the server. But this is not the case, so the clients can do what they want, they can not dive into the normal program landscape. However, the socket connection remains. So there is a socket connection without the event "connection" being called on the server.
Platform:
Additional context Important, socket.io server running in google cloud run (docker container) and scale up/down up to the traffic, we had 250 connection at one container, a scale will happen at 150 open request.