Open aaronkchsu opened 2 years ago
That sounds weird indeed.
Do you know the reason
of the disconnection on the client side? Any error thrown?
Is this specific to a given browser?
Not a specific browser, it happens on all their browsers which leads me to believe its a network issue?
We get a ping timeout but disappears after a few times which makes think it might think its connected when its not?
For separate clients we have seen a transport close error after few hours and no try to reconnect, is that normal or a way to fix or debug that?
Thanks so much @darrachequesne
+1! We're dealing with that but we don't know if is socket.io
and/or the tab throttling feature in browsers. (Maybe exist a bug in Chromium-based browsers for a long period of sessions where drop WebSocket connections)
@aaronkchsu ping me, maybe we can gather info around this and share with @darrachequesne.
The most common error we detect is transport close
, and the socket.io-client
does not reconnect automatically.
I'm also experiencing a similar issue where clients occasionally report a 'ping timeout' and the server-side the corresponding transport close error. I debugged this by doing a tcpdump on the client-side. And what I observed is that the server very neatly sends the PING packet every 30s (my pingInterval, pingTimout is 10s). But when it fails it has sent the PING packet too late (after 42s, more than pingInterval + pingTimeout). So it smells like a server side issue to me.
I have a similar problem with a flask-socketio server app hosted behind an aws alb and the client socket session seems to be consistently disconnecting every 26 seconds.
Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/eventlet/wsgi.py", line 573, in handle_one_response result = self.application(self.environ, start_response) File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2464, in call return self.wsgi_app(environ, start_response) File "/usr/local/lib/python3.8/site-packages/flask_socketio/init.py", line 45, in call return super(_SocketIOMiddleware, self).call(environ, File "/usr/local/lib/python3.8/site-packages/engineio/middleware.py", line 60, in call return self.engineio_app.handle_request(environ, start_response) File "/usr/local/lib/python3.8/site-packages/socketio/server.py", line 560, in handle_request return self.eio.handle_request(environ, start_response) File "/usr/local/lib/python3.8/site-packages/engineio/server.py", line 374, in handle_request socket = self._get_socket(sid) File "/usr/local/lib/python3.8/site-packages/engineio/server.py", line 565, in _get_socket raise KeyError('Session is disconnected') KeyError: 'Session is disconnected'
Client requests use /socket.io/?EIO=3&transport=polling&t=O9hzOSn&sid=xxxx
Server version of SocketIO is 4.6.0
This could be related to the timeout of the TCP connections in the proxy servers or balancers in front of the app. We increased the TCP timeout and the stability of the connection was improved, but not 100%.
In fact, the ping/pong heartbeat is designed not only to check the connection between server and client but is to maintain the connection alive to avoid proxy timeout, by default on proxies is 60 seconds. So, check if the interval is lower than the default value.
Nginx timeout, AWS Load Balancer.
Also, check this post: https://blog.martinfjordvald.com/websockets-in-nginx.
The idle timeout was set to 30 seconds on the load balancer in AWS. I did increase this to 90 seconds this morning to see if this played a role in any way, and the consistent ~26 second session disconnects remained unchanged,
As far as how SocketIO is configured in this server app, here is the code (it's quite vanilla):
!/usr/bin/python
from flask import Flask from flask_socketio import SocketIO import os
from Config import ServerConfig
from main.Utils import ServerLogger
socketio = SocketIO()
gLogger = ServerLogger()
from dbinterface.DatabaseInterface import DatabaseInterface gDatabaseInterface = DatabaseInterface()
from settings.Settings import SettingsManager gSettingsManager = SettingsManager()
from robots.RobotsManager import RobotsManager gRobotsManager = RobotsManager()
from users.UserManager import UserManager gUserManager = UserManager()
def create_app(config_class=ServerConfig): app = Flask(name) app.config.from_object(config_class) app.config.from_envvar("SERVER_CONFIG_FILE") if not os.path.exists(app.config["LOG_PATH"]): os.makedirs(app.config["LOG_PATH"]) gLogger.setLogPath(app.config["LOG_PATH"]) gLogger.log("Creating App") socketio.init_app(app) registerBlueprints(app) gDatabaseInterface.init_interface(app) gRobotsManager.loadRobots() gUserManager.loadUsers() gUserManager.setSecretKey(app.config["SECRET_KEY"]) gSettingsManager.loadSettings() gLogger.log("Init done") return app
And for the client:
let ServiceModule = angular.module('ServiceModule', []); ServiceModule.service('Socket', function($rootScope){ var socket = io.connect(); return { on: function(eventName, callback) { socket.on(eventName, function() { var args = arguments; $rootScope.$apply(function() { callback.apply(socket, args); }); }); }, emit: function(eventName, data, callback) { if(typeof data == 'function') { callback = data; data = {}; } socket.emit(eventName, data, function() { var args = arguments; $rootScope.$apply(function() { if(callback) { callback.apply(socket, args); } }); }); }, emitAndListen: function(eventName, data, callback) { this.emit(eventName, data, callback); this.on(eventName, callback); } }; });
@johnfilo-kmart Do you consider the wake-up throttling in Chrome? In fact, is an active bug, check: https://bugs.chromium.org/p/chromium/issues/detail?id=1224672&q=websocket&can=2
You need to apply a technique to avoid this.
Ooh, no I didn't know about that. I'll look into it.
Describe the bug A clear and concise description of what the bug is. We have an nodejs websocket server with 3k+ concurrent connections. A few segment of clients disconnect and reconnect every few seconds/minutes.
We trigger the connection by joining to a room.
We use AWS ELB and their support says the load balancer has no problem.
On my machine i stay connected with the exact same roomId.
However his connection on the server end looks like this where it disconnects and reconnects. When we try to send the client a message using
io.to(roomId).emit()
it will only few of the messages instead of every message emittedOne of the clients that saw this behavior we tested with a different browser source from another websocket app and the messages work every time to his computer. The same client also had high speed spectrum internet.
To Reproduce
Our server settings - version 4.4.1
Server
Socket.IO client version:
x.y.z
Client
Expected behavior Expect all clients to stay connected
Platform:
CPU Speed: 3700MHz
Additional context I added a disconnect protocol which helped some clients that were international, but few clients still seeing this behavior