socketio / socket.io

Realtime application framework (Node.JS server)
https://socket.io
MIT License
61.12k stars 10.11k forks source link

Perpetual https disconnections over HTTP/2 (due to missing "Keep-Alive" header?) #3704

Open mihaibrana opened 3 years ago

mihaibrana commented 3 years ago

Hello :). I'd like to report a possible bug.

Current behaviour

After having developed my application in an unsecured context using HTTP 1.1 (all well), I have now deployed it to a HTTP 2 server using HTTPS. Again, all fine and dandy. For 30 seconds... :)

After that, the socket disconnects and connects again. And again. And again.

image

I'd like to somehow fiddle this in your example, but I'd need to run it behind a load balancer that handles the HTTPS layer, and there's no easy way for me to do that.

Expected behaviour

That it doesn't drop my connection :). To be clear: all communication works JUST FINE even after a disconnect. The socket disconnects and then immediately reconnects. But this is a problem for me because I lose the handle on a stream server-side. I am using a stream over socket to communicate.

What I saw missing from the server response are the Connection: keep-alive and Keep-Alive: timeout=5 headers that I get on my HTTP 1.1 server. The code is absolutely identical and communication does work just fine.

Perhaps socket.io has some smart way of working over HTTP 2 but I couldn't find anything about this in the documentation.

It's also interesting that the socket client DOES request the keep-alive header. But alas, nothing is returned and that's why maybe the socket disconnects :(

Setup

NodeJS Express web server, simple HTTP, but routed through a Load Balancer (via KONG) which only accepts HTTPS connection. The Load Balancer handles the HTTPS certificate. The Load Balancer also strips those headers away from the response.

I am using socket.io-stream for communication. I'm basically sending a stream object over the socket and all operations are then made via that stream.

Of course, I'd be happy to assist in debugging this in any way I can.

mihaibrana commented 3 years ago

After encountering the EXACT same issue when using the WebSocket object in the browser, I dug deeper and found this in the documentation of the Google Load Balancer service we're using:

The timeout for a WebSocket connection depends on the configurable backend service timeout of the load balancer, which is 30 seconds by default. This timeout applies to WebSocket connections regardless of whether they are in use. For more information about the backend service timeout and how to configure it, see Timeouts and retries.

https://cloud.google.com/load-balancing/docs/https?fbclid=IwAR2Qugtlyvd05VteGWk2RCevebUJrHTHyW9RHAwYiPxudM6qOovaa2Zdqpk

Yeah, 30 seconds! :). Most likely this is the culprit, not Socket.io.

darrachequesne commented 3 years ago

Good catch, for the 30 seconds timeout!

I think we should add this in the documentation somewhere.

mihaibrana commented 3 years ago

Nice idea with adding a note in the documentation! More and more services are migrating to Kubernetes/Cloud. With this, there is an increased risk for people to experience issues due to these services cutting the connection.

You have no idea how happy I was when I found that statement in their documentation. I had been digging for days, blaming missing headers and eventually thinking it's a bug with the library (although it seemed very unlikely nobody had used it via HTTP2, but I had no answer from a Stack Overflow question either). It was grueling.

I can now confirm that we've added the following config to our server:

apiVersion: cloud.google.com/v1beta1 kind: BackendConfig metadata: name: feathers-backendconfig spec: timeoutSec: 1800 connectionDraining: drainingTimeoutSec: 1800

Source: https://medium.com/johnjjung/how-to-use-gcp-loadbalancer-with-websockets-on-kubernetes-using-services-ingresses-and-backend-16a5565e4702

Maybe it's good to link this source in your docs.