socketio / socket.io

Realtime application framework (Node.JS server)
https://socket.io
MIT License
60.53k stars 10.09k forks source link

NOOP interval/keepalive feature #4868

Open geirbakke opened 8 months ago

geirbakke commented 8 months ago

Is your feature request related to a problem? Please describe. We experience on many machines, socket.io won't detect all websocket/tcp/network disconnects (like terminating vpn process, ripping out network cable, ...), until the ping-pong messaging occurs. But if we try to use the websocket (by sending something) after the network loss/change, the disconnect is detected immediately by socket.io.

Our solution to this, to keep the detection time short, and avoid having too short ping-pong timing (which might cause issues if you have slow startup/wake, maybe lots of events?, high RTT?), has been to send a engine.io noop message every 2s. With that we detect network loss within 2s from the client side.

As a bonus it seems to keep some small portion of websockets alive, maybe because of increased traffic, that used to break somewhere approx every 5m. The default ping interval didn't stop this, but sending this noop every 2s seems to.

Describe the solution you'd like It would be nice to have this as a feature, noopInterval or something like it, just sending noop or something else periodically when the connection is supposed to be open. Or some other way to solve the problem.

Describe alternatives you've considered pingInterval+timeout, but we're afraid of setting those too low.

Additional context

darrachequesne commented 7 months ago

Thanks a lot for the detailed analysis :+1:

Our solution to this, to keep the detection time short, and avoid having too short ping-pong timing (which might cause issues if you have slow startup/wake, maybe lots of events?, high RTT?), has been to send a engine.io noop message every 2s.

I think using a value of 2s for the pingInterval (and keep the default value for pingTimeout) should have the exact same effect.

As a bonus it seems to keep some small portion of websockets alive, maybe because of increased traffic, that used to break somewhere approx every 5m.

That sounds like something between the server and the client that close the connection if it's idle for too long. Maybe some misconfigured proxy?

geirbakke commented 7 months ago

Thx for your response @darrachequesne

Having a 2s pingInterval would have the same effect as the above detecting a broken websocket by using it - but it would happen on the server side, not the client. So the client wouldn't know.

From the doc: Conversely, if the client does not receive a PING packet within pingInterval + pingTimeout, it will consider that the connection is closed. So to achieve a 2s broken websocket detection on client side, pingInterval+pingTimeout sum would need to be 2s, if I understand this correctly. But this is problematic, as lags will happen. The benefit of sending something non ping (like the noop suggested) - is that we can allow lag to happen, but still detect a broken websocket in many cases.

no idea why some connections break at approx 5m. We have many users all over the world, and currently unable to find a pattern. But it apparently helps sending 2s noop, while sending default ping (25s?) doesn't.

darrachequesne commented 7 months ago

So to achieve a 2s broken websocket detection on client side, pingInterval+pingTimeout sum would need to be 2s, if I understand this correctly

That's right. For disconnection detection, we rely on either:

I guess we could implement this feature if there's enough demand (with the downside that it would induce a greater battery consumption on phones).

Out of curiosity, what is your use case? Gaming?

geirbakke commented 7 months ago

We use it for video conferencing signalling. For us the increased network/cpu/battery cost of sending 1(?) byte every 2s is negligible. It would be cleaner for us to have this as a feature in socket.io, but i guess there isn't enough demand yet...

christopherreay commented 6 months ago

Yeah I super felt like @darrachequesne didnt really take the time to understand this point

darrachequesne commented 6 months ago

@christopherreay I did! As usual, the question is whether we want to implement — and provide support for — this feature, increase the bundle size, and whether it would benefit most users.