Closed jimaek closed 1 year ago
Both, the server and the client track the ping/pong timeout. If the client doesn't receive the ping
request on time (pingInterval
), it closes the connection with error ping timeout
, and disconnects. The server captures the disconnect and reports transport close
. On the flip side, if the server doesn't receive pong
response on time (pingTimeout
), it reports ping timeout
, and the client sees it as severed connection - reports transport close
error.
https://github.com/socketio/socket.io/issues/3191 https://github.com/socketio/socket.io/issues/4333
either way, the issue is due to timeout.
Small summary:
Not sure how relevant this still is.
It is and a lot. We just need to setup proper logging first to see all the problems
So I've managed to reproduce the issue using https://github.com/tylertreat/comcast tool. Ping timeouts for my local probe stably occurs on the GPRS and sometimes on EDGE network quality (https://github.com/tylertreat/comcast#network-condition-profiles).
I've tried switching socket.io transport from 'websocket' to 'polling' and different combinations of that, but nothing changed.
One of the faulty probes is located on VPS that we own. It disconnects ~10 times an hour and reconnects in a few seconds. The VPS seems to be slow and unresponsive during ssh. Network speed tests shows max latency of 1000ms. Seems like throughput regularly drops for a few seconds. So all of that points to the server network problems that we are not able to deal with.
So I believe we should accept that the total number of effective probes will vary all the time. We are adding monitoring to see how many disconnects happens in a time interval. Above of that we should add a mechanism to explicitly "ignore" faulty probes. It can be a manual list, circuit breaker, or smth else, here is the issue for that: https://github.com/jsdelivr/globalping/issues/52
Task to track the issue where probes that are far away from EU like China and Costa Rica often disconnect and reconnect. Latency is only 256ms, so our previous timeout of 2seconds and now 4seconds should have been enough.
Currently we're trying https://github.com/uNetworking/uWebSockets.js