uNetworking / uWebSockets.js

μWebSockets for Node.js back-ends :metal:
Apache License 2.0
7.85k stars 570 forks source link

[feature] is it possible to add autoPingTime work with idleTimeout? #395

Closed roytan883 closed 3 years ago

roytan883 commented 3 years ago

Is it possible to add autoPingTime work with idleTimeout in WebSocketBehavior config ? like this:

/** Maximum amount of seconds that may pass without sending or getting a message. Connection is closed if this timeout passes. Resolution (granularity) for timeouts are typically 4 seconds, rounded to closest.
     * Disable by leaving 0.
     */
idleTimeout: 250,
/** auto send ping to remote client interval time (seconds) 
     * the response pong will auto refresh idleTimeout, make sure remote client is alive
     * Disable by leaving 0.
     */
autoPingTime: 120,

This can make sure remote client is alive especially in mobile network.

Right now i'm recording all websocket, loop sending ping and handle pong at nodejs APP layer. If this feature can be add to uWebSockets core layer, maybe the performance will be better and APP layer will be easier.

hst-m commented 3 years ago

Why are you not having the Client send the ping message? You want the server to do as little work as possible

roytan883 commented 3 years ago

@hst-m 1, It's a big topic about APP background alive on Android and iOS. At some scenario, only the server sending ping can keep the APP socket alive, especially mobile carrier company may have different strategy on socket alive time. 2, if the client are not write by us, other partner's websocket client may not send ping. 3, html websocket client not provide send ping method directly. https://developer.mozilla.org/docs/Web/API/WebSocket

hst-m commented 3 years ago

Docs for idleTimeout say it works like this: "Maximum amount of seconds that may pass without sending or getting a message" so sending any message/ping from server resets the idleTimeout even if you don't receive a response, so what you are trying to do won't work, although the connection on server seems to auto-close after about 20 seconds if the client does not respond (tested by turning off client network)

1) Regarding putting App into background on mobile, I tried this on my Android Chrome, the client continues to send ping messages when in different tab, or when in different app, the only time it stops is when you lock the phone. Is this acceptable? I do see that with phone locked, the client still receives messages from the server, which keeps the client open and client code keeps running, but after about 2 minutes the client code stops running and connection closes even while getting server messages. So from what I can tell, the only benefit you get from sending messages from server to client is 2 minutes of extra time if they lock phone. But you are saying regardless of whether you have app in background or not, for some networks they require you to have the server send messages otherwise it closes connection. I have not seen that but if true for your network then you can have server send ping or messages on interval 2) tell them to send ping message if they want to keep connection open, otherwise they can just re-open connection when it closes 3) I posted in other issue https://github.com/uNetworking/uWebSockets.js/issues/394 to simulate a "ping" from the browser client just send an empty ws message and the server code can ignore it

roytan883 commented 3 years ago

@hst-m did you mean the pong will not reset idleTimeout ?

hst-m commented 3 years ago

You can send a Ping from the server and it will reset the idleTimeout, it does not need to receive a Pong back to reset idleTimeout. A Pong will also reset idleTimeout though

roytan883 commented 3 years ago

oh, sending will also reset the idleTimeout. I used to treat it only refresh when receive any data. So if the client not respond pong, the idleTimeout will still be refreshed? Then i can't use idleTimeout to check whether client is alive.

Maybe i need aliveTimeout with autoPingTime, stand for client alive, client -> server data make socket connection alive. So I should keep my server APP layer sending ping and handle pong, not depend on idleTimeout.

hst-m commented 3 years ago

~You can still keep idleTimeout~ actually ya you don't need idleTimeout in your case. If you send a Ping and there is no response the server closes the connection in 20 seconds with 1006 Abnormal Closure where as idleTimeout violation closes the connection with 1006 Abnormal Closure WebSocket timed out from inactivity after the timeout time. So just keep that in mind, its 20 seconds and a different close message if you send a Ping/message with no response (unless your idleTimeout is less than 20 seconds which would trigger the idleTimeout close before the 20 second auto-close)

roytan883 commented 3 years ago

You can still keep idleTimeout, I see that if you send a Ping and there is no response the server closes the connection in 20 seconds with 1006 Abnormal Closure where as idleTimeout violation closes the connection with 1006 Abnormal Closure WebSocket timed out from inactivity after the timeout time. So just keep that in mind, its 20 seconds and a different close message if you send a Ping with no response

It looks good when no response close.

So what the bad side about adding autoPingTime ? It can be disabled by default 0. But in my case, APP layer no need do anything to trace ping and pong. The idleTimeout + autoPingTime can make sure the client is really alive no matter what network scenario. I guess lots of user need this feature.

ghost commented 3 years ago

The idleTimeout + autoPingTime can make sure the client is really alive no matter what network scenario.

idleTimeout already does this. You can't send on a broken connection. The kernel has a retransmission timeout that will cause the app to see the connection as broken unless it gets a TCP ack for the send in time. The default of this timeout is maybe a little off, but not that much.

Timeout is definitely not something you should do in scripted business logic. This is and should be part of the library. It can be tweaked, sure, but it works as is.

hst-m commented 3 years ago

@roytan883 No, people should be sending "ping" keep-alive from the client with idleTimeout on the server. Then you don't need to be "tracing" Ping and Pong on the server like you said. See issue for example sending "ping" keep-alive from client https://github.com/uNetworking/uWebSockets.js/issues/394

roytan883 commented 3 years ago

@roytan883 No, people should be sending "ping" keep-alive from the client with idleTimeout on the server. Then you don't need to be "tracing" Ping and Pong on the server like you said. See issue for example sending "ping" keep-alive from client #394

Ok, I see the problem why you insist on client sending ping. I can explain more about problem 1.

In China, all Android device can not use Google GMS, So we must develop our own push service. The big problem is about APP alive. It stand for two side: app process alive, app socket alive. About app socket alive, we have different android phone and different android system version, test on China three big mobile companies: China Mobile, China Telecom and China Unicom. The real test result is different, use different mobile company's phone card, the tcp socket alive time is different, and in some case, need both side ping pong to keep socket alive( for example: on some phone card if 6 minutes no data from server to client, although the connect looks good, no callback for socket close or error, but when client use it sending data, the socket return error immediately). As i said, mobile tcp socket is not stable as normal tcp socket, there are many edge case and different scenario which related to phone, phone system, network router, network carrier company.

hst-m commented 3 years ago

Have you looked into Web Push Notifications, and you are sure this does not work for you? https://developers.google.com/web/fundamentals/push-notifications

roytan883 commented 3 years ago

idleTimeout already does this. You can't send on a broken connection. The kernel has a retransmission timeout that will cause the app to see the connection as broken unless it gets a TCP ack for the send in time. The default of this timeout is maybe a little off, but not that much.

Timeout is definitely not something you should do in scripted business logic. This is and should be part of the library. It can be tweaked, sure, but it works as is.

@alexhultman If idleTimeout is 10 minutes, no sending data to client and not receive any data from client, how can we make sure the client is really alive? especially for mobile device client.

As i understand, the kernel SO_KEEPALIVE default is disabled, and tcp_keepalive_time is 7200 seconds (2 hours).

roytan883 commented 3 years ago

Have you looked into Web Push Notifications, and you are sure this does not work for you? https://developers.google.com/web/fundamentals/push-notifications

As i said, in China. Most of Google service is forbidden, both technical and LAW. Even the link you provide can't not open in China.

But its not the point whether we can use Google service. The point is at some mobile network scenario, we do need server -> client ping to keep socket alive, it based on real device test result.

hst-m commented 3 years ago

I have no idea what is going on in China but if you need to send some Pings from the server to the client go ahead and do it, but don't expect to add an autoping feature that no one else needs

roytan883 commented 3 years ago

I have no idea what is going on in China but if you need to send some Pings from the server to the client go ahead and do it, but don't expect to add an autoping feature that no one else needs

Its not about in China or not. Technically, its about tcp socket NAT on mobile base station. Each mobile base station keep NAT hole time for tcp socket is different on different mobile company. some is 6 minutes, some is 28 minutes. So each APP must do heartbeat to keep socket alive. There are some example: WhatsApp 4m45s, Line 7m.

hst-m commented 3 years ago

If you want to use native feature, just do ws.subscribe('all') in the open handler and then do app.publish('all','') to send empty message to every client. This should be faster than looping through websockets in JS. The close event should trigger if there is problem with connection after 20 seconds (or sooner if the client officially closes the connection)

roytan883 commented 3 years ago

If you want to use native feature, just do ws.subscribe('all') in the open handler and then do app.publish('all','ping') to send a ping message to every client. This should be faster than looping through websockets in JS. The close event should trigger if there is problem with connection

Its a good idea, i think it will have better performance than my current solution.

Also, there is a topic about Carrier Grade NAT Timeout. At France mobile network, how to config its XMPP server to send keepalive(ping). Its NAT time is like China mobile company, around 5 minutes. That's why I take example for 4~5 minutes autoPing. His solution is change kernel tcp_keepalive_time, my suggestion is add this feature autoPingTime in uWebSockets core.

https://blog.wirelessmoves.com/2020/09/carrier-grade-nat-timeouts-and-how-to-configure-your-xmpp-server.html

ghost commented 3 years ago

If idleTimeout is 10 minutes, no sending data to client and not receive any data from client, how can we make sure the client is really alive? especially for mobile device client.

You can't. That's why you send pings. Even if you have 10 minutes idleTimeout (which is really high) you still have to make sure to send pings every, say 1 minute. Either you do this from the client side (which you do not seem to want) or you do this from the server (which is more demanding for the server).

Most users, I would guess, have idleTimeout of about 1 minute and send pings from clients with maybe 30 second interval. That works over whatever Chinese or Russian phone or sputnik. It doesn't matter if you live in China or not, you still play by TCP rules.

I don't have any immediate plan to change any of this, but I might tweak things and (maybe) add this autoPingInterval, but not because it is needed, but rather because in some cases the client (application) protocol is so stupidly designed that it cannot handle client-side pings.

uasan commented 3 years ago

As i understand, the kernel SO_KEEPALIVE default is disabled, and tcp_keepalive_time is 7200 seconds (2 hours).

If you are using Nginx as reverse proxy, you can try to configure TCP

so_keepalive=on|off|[keepidle]:[keepintvl]:[keepcnt] this parameter configures the “TCP keepalive” behavior for the listening socket. If this parameter is omitted then the operating system’s settings will be in effect for the socket. If it is set to the value “on”, the SO_KEEPALIVE option is turned on for the socket. If it is set to the value “off”, the SO_KEEPALIVE option is turned off for the socket. Some operating systems support setting of TCP keepalive parameters on a per-socket basis using the TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT socket options. On such systems (currently, Linux 2.4+, NetBSD 5+, and FreeBSD 9.0-STABLE), they can be configured using the keepidle, keepintvl, and keepcnt parameters. One or two parameters may be omitted, in which case the system default setting for the corresponding socket option will be in effect. For example,

so_keepalive=30m::10 will set the idle timeout (TCP_KEEPIDLE) to 30 minutes, leave the probe interval (TCP_KEEPINTVL) at its system default, and set the probes count (TCP_KEEPCNT) to 10 probes.

http://nginx.org/en/docs/stream/ngx_stream_core_module.html#tcp_nodelay

roytan883 commented 3 years ago

@alexhultman

At beginning we do send ping from client to server.

But after more real test and more search, we find at mobile network, only client->server ping can't not cover all case. That's why i'm talking about server->client ping. Different mobile company, 3G, 4G, 5G, NAT timeout, tcp socket is not stable as we expect.

Thanks for consider adding autoPingInterval to uWebSockets core.

ghost commented 3 years ago

If you want to use native feature, just do ws.subscribe('all') in the open handler and then do app.publish('all','ping') to send a ping message to every client. This should be faster than looping through websockets in JS. The close event should trigger if there is problem with connection after 20 seconds (or sooner if the client officially closes the connection)

Smart. That's a quite simple solution really. Just do a setInterval in JS and make it publish to some topic that all clients are subscribed to.

roytan883 commented 3 years ago

Another case client->server not work is Android APP background scenario. Android APP have some kind of sleep rule when at background, light sleep and deep sleep. When light sleep, APP can wake up itself and send ping to server. But in deep sleep, the APP process can not even wake up itself process. At this time, if a server->client ping is received from cell mobile down layer, system will wake up the APP process handle it.

ghost commented 3 years ago

Another case client->server not work is Android APP background scenario. Android APP have some kind of sleep rule when at background, light sleep and deep sleep. When light sleep, APP can wake up itself and send ping to server. But in deep sleep, the APP process can not even wake up itself process. At this time, if a server->client ping is received from cell mobile down layer, system will wake up the APP process handle it.

This makes sense (I don't know for sure, but I can buy it), but do you really want a WebSocket to be kept alive during deep sleep? Isn't it better to just reconnect? So to not hog resources on the server?

roytan883 commented 3 years ago

If you want to use native feature, just do ws.subscribe('all') in the open handler and then do app.publish('all','ping') to send a ping message to every client. This should be faster than looping through websockets in JS. The close event should trigger if there is problem with connection after 20 seconds (or sooner if the client officially closes the connection)

Smart. That's a quite simple solution really. Just do a setInterval in JS and make it publish to some topic that all clients are subscribed to.

@hst-m @alexhultman will it cause big network traffic flow? If 100K client connect to server, it send all ping at one time.

Right now, my solution is JS loop all ws, and send 1000 ping then await 500ms, then continue.

sendAutoPing = async () => {
    let sendCount = 0
    let clientInfos = _.values(this.clientMap)
    for (let index = 0; index < clientInfos.length; index++) {
      const clientInfo = clientInfos[index];
      if (_.isNil(clientInfo) || _.isNil(clientInfo.p)) {
        continue
      }
      sendCount++
      if (sendCount === 1000) {
        //send ping smooth
        await Bluebird.delay(500)
        sendCount = 0
      }
      _.each(clientInfo.p, (platformInfo, platform) => {
        if (platformInfo && platformInfo.ws) {
          let ws = platformInfo.ws
          if (!ws._isKicked) {
            ws.ping()
          }
        }
      })
    }
  }
ghost commented 3 years ago

On the other hand, if you can be woken from deep sleep from a network interrupt then surely you can register a timer to be woken up when to send the client side ping. I know for a fact that you can register timer interrupts to basically any processor ever created.

roytan883 commented 3 years ago

On the other hand, if you can be woken from deep sleep from a network interrupt then surely you can register a timer to be woken up when to send the client side ping. I know for a fact that you can register timer interrupts to basically any processor ever created.

Yes, in our APP we do have timer send client->server ping. But also we need server->client ping.

Network interrupt not work as timer. The Android system will disable all timer for APP in deep sleep. But the socket data event is from some kind of mobile cell down layer module service, it can always wake up APP.

roytan883 commented 3 years ago

WebSocket to be kept alive during deep sleep? Isn't it better to just reconnect? So to not hog resources on the server?

@alexhultman

As i said, in China, we can't use GCM(Google Cloud Messaging) or FCM(Firebase Cloud Messaging), both technical and LAW. we must develop our own push service. Implement a push service, need long-alive socket connection.

roytan883 commented 3 years ago

@hst-m @alexhultman Actually, i used to create a smart keep alive solution. Server normally do not send any ping. it trace all received data(ping, pong, message) to refresh internal updateTime. If the updateTime is old than autoPingInterval, server will send ping to client, if still no response, than treat client is disconnected, call close.

But later i found most of client is inactive mode, seems smart keep alive not saving many network load, and logic is complex than fixed ping interval. That's why i'm using a simple JS loop to send ping now.

uasan commented 3 years ago

It looks like your function is only useful for determining on the server side that the connection client is closed, your function detects this before the TCP timeout or idleTimeout, this is the only benefit of your function.

roytan883 commented 3 years ago

It looks like your function is only useful for determining on the server side that the connection client is closed, your function detects this before the TCP timeout or idleTimeout, this is the only benefit of your function.

@uasan Yes, at beginning i thought the idleTimeout is only refreshed by receiving data. Then @hst-m clear it for me that also effect on sending data.

But the purpose is to make sure all client is alive. not-alive clients should be delete (@hst-m said it will auto close when no pong after 20s). Keep not-alive clients do not make sense.

ghost commented 3 years ago

@uasan Yes, at beginning i thought the idleTimeout is only refreshed by receiving data. Then @hst-m clear it for me that also effect on sending data.

TCP cannot send data without also receiving data (within a reasonable time). <--- impoartant

So idleTimeout works "as if" it measured only received data. I might change this back so that idleTimeout only resets on data receive, but that is only a minor tweak - in practice it already does this.

uasan commented 3 years ago

We have two technical arguments in favor of the add autoPingInterval setting.

  1. Rapid detection of dead connections
  2. Ping to client who cannot send pings on their own

and plus one non-technical argument, the presence of this setting, to reduce questions how to make ping/pong, you will have a standard solution for them - autoPingInterval )

ghost commented 3 years ago

I can definitely make sense to add this support, and maybe tweak the rule about resetting idleTimeout on sends. However like said none of this is currently "broken". We are talking about subtle improvements.

anilanar commented 3 years ago

On the other hand, if you can be woken from deep sleep from a network interrupt then surely you can register a timer to be woken up when to send the client side ping. I know for a fact that you can register timer interrupts to basically any processor ever created.

Let me give the browser perspective on this:

Modern browsers, regardless of device type, started "throttling" background tabs very aggressively since a year? They have been there for a long time, but aggressiveness increases over time. Safari's throttling is the most aggressive one. All of them use a "budget" system, so the tab's thread has a budget in milliseconds per unit time frame. If there's budget, then the browser schedules the next micro/macro task. If there's no budget, then you wait. Which means that micro/macro tasks can pile up until the tab becomes active.

One exception to throttling is network interrupts. They initiate a window of budget-free time frame in which the tab can run whatever it wants, most likely emptying the micro/macro task queue. It's probably somewhere around 10-500ms, I didn't test the exact allowance.

TL;DR without server pings, background tabs have a high chance of disconnecting; especially for safari (incl. desktop).

Chrome: https://developers.google.com/web/updates/2017/03/background_tabs Safari: (no official announcement, ofc) https://www.google.com/search?q=safari+background+tab+throttling&oq=safari+background+tab+throttling

ghost commented 3 years ago

Thanks for adding more info. What do you think about autoPing: true/false?

Only going by idleTimeout, being entirely automatic.

hst-m commented 3 years ago

autoPing option probably a good idea, I noticed when only having idleTimeout: phone sleeps and triggers the idleTimeout close connection, but this close event gets sent to client which wakes the client back up and it re-opens the connection, goes back to sleep, triggers idleTimeout again so loop of open/close events which is no good. Ended up increasing idleTimeout to 2.5 minutes to prevent this on phone. Can also send Ping from server on interval to fix. Future solution could be to add autoPing feature that sends Ping at the idleTimeout event instead of closing connection, if there is no response the connection closes

slavaGanzin commented 3 years ago

@alexhultman thanks

@hst-m Did constant pinging prevent throttling in modern safari? 2 years ago when I tried this, it doesn't. In IOS safari I used to use:

<meta id=backgroundRefresh http-equiv="refresh" content=-1>
<script>
setInterval(() => document.getElementById('backgroundRefresh').content = 5, 1000)
</script>

JavaScript constantly updating time to reload page, and when safari throttles JS, it wakes up in 5 seconds.

ghost commented 3 years ago

(this support is implemented in v19, "binaries" branch has it but it is not released yet)