websockets / ws

Simple to use, blazing fast and thoroughly tested WebSocket client and server for Node.js
MIT License
21.75k stars 2.42k forks source link

Dispatched events are not scheduled as tasks #2159

Closed OrKoN closed 1 year ago

OrKoN commented 1 year ago

Is there an existing issue for this?

Description

I am not sure this is a bug or rather a topic for discussion. I have noticed that there is a subtle difference between ws and the WebSockets standard. The standard says that when a message is received the agent has to queue a new task. This has an important side effect that the microtask queue will be processed after the task is over and before the next task is picked from the queue. This behaviour is important in the cases where the messages need to be processed in a specified order.

This can be demonstrated by the following server code:

import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', function connection(ws) {
  ws.send('something1');
  ws.send('something2');
});

And the client code:

import { WebSocket } from 'ws'; // in the browser, omit this line.

const ws = new WebSocket('ws://127.0.0.1:8080');

ws.addEventListener('message', function message(event) {
  console.log('received: %s', event.data);
  new Promise(resolve => {
    resolve()
  }).then(() => {
    console.log('microtask', event.data.toString())
  })
});

If you run this code in Node and in the browser, you should observe that the order of console messages is different. It's probably not feasible to change the behaviour of 'ws' as it could be a breaking change but I wonder if there could be some compatibility mode for this? I believe it's possible to workaround that using setTimeout or setImmediate but the downside is that the same code cannot be used across platforms. If I have overlooked an existing issue about this topic, I apologize.

ws version

8.13.0

Node.js Version

v20.3.1

System

System: OS: macOS 13.5 CPU: (10) arm64 Apple M1 Max Memory: 696.63 MB / 64.00 GB Shell: 5.9 - /bin/zsh

Expected result

The order of console messages is the same in Node and in the browser and matches the behaviour defined by the standard:

received: something1
microtask something1
received: something2
microtask something2 

Actual result

The order of console messages in Node is different (microtasks are processed after all messages):

received: something1
received: something2
microtask something1
microtask something2

Attachments

No response

lpinca commented 1 year ago

Yes, ws does not strictly follow the WHATWG specification. I don't think this is a bug and to be honest I find ws behavior less surprising.

To match the browser behavior only a single 'message' event per microtask should be emitted. We could do this at the parser level, for example by using a Transform stream instead of a Writable stream, pausing it after each 'data' event and resuming it after the queueMicrotask() callback is called. However, this should be thought carefully as it might affect performance and have side effects and I'm not sure if it's worth it. I think there is not much code that depends on this behavior.

lpinca commented 1 year ago

@KhafraDev fyi, the WebSocket implementation in https://github.com/nodejs/undici has the same non spec compliant behavior.

lpinca commented 1 year ago

@OrKoN https://github.com/websockets/ws/pull/2160 should fix this but I did not run any benchmark and have not thought about possible side effects yet.

OrKoN commented 1 year ago

@lpinca thanks!

OrKoN commented 1 year ago

In puppeteer, we worked around the problem by scheduling all event handlers triggered by 'ws' using setImmediate. We have not noticed any performance regressions (we don't handle a lot of traffic though).

lpinca commented 1 year ago

Here are some benchmark results run under WSL 2.

20 bytes messages - websockets/ws#8.13.0

$ ./load_test 32 127.0.0.1 8080 0 0
Using message size of 20 bytes
Running benchmark now...
Msg/sec: 59225.000000
Msg/sec: 60525.500000
Msg/sec: 59986.500000
Msg/sec: 59474.250000
Msg/sec: 59477.250000

20 bytes messages - websockets/ws#fix/issue-2159

$ ./load_test 32 127.0.0.1 8080 0 0
Using message size of 20 bytes
Running benchmark now...
Msg/sec: 52816.500000
Msg/sec: 54566.250000
Msg/sec: 54582.500000
Msg/sec: 54458.750000
Msg/sec: 54205.500000

1024 bytes messages - websockets/ws#8.13.0

$ ./load_test 32 127.0.0.1 8080 0 0 1024
Using message size of 1024 bytes
Running benchmark now...
Msg/sec: 49375.750000
Msg/sec: 49289.250000
Msg/sec: 49235.000000
Msg/sec: 48787.750000
Msg/sec: 49216.750000

1024 bytes messages - websockets/ws#fix/issue-2159

$ ./load_test 32 127.0.0.1 8080 0 0 1024
Using message size of 1024 bytes
Running benchmark now...
Msg/sec: 47471.500000
Msg/sec: 46715.250000
Msg/sec: 46860.250000
Msg/sec: 47072.250000
Msg/sec: 47157.500000

load_test is this tool: https://github.com/uNetworking/uWebSockets/blob/master/benchmarks/load_test.c. The server is a simple echo server.

import { WebSocketServer } from 'ws';

const server = new WebSocketServer({ port: 8080 }, function () {
  console.log('Server listening on *:8080');
});

server.on('connection', function (ws) {
  ws.on('message', function (data, isBinary) {
    ws.send(data, { binary: isBinary });
  });
});

The impact for small messages (20 bytes) is noticeable (~10%).

lpinca commented 1 year ago

On macOS with 250 clients I see no significant overhead. If you are using Linux on bare metal, may I ask you to run some benchmarks like the ones above (increasing the number of clients up to 1000 might make sense)? Thank you.

OrKoN commented 1 year ago

Unfortunately, I don't have a Linux on bare metal.

OrKoN commented 1 year ago

This is the numbers I get on an Intel Macbook (node 20):


# 8.13.0

./load_test 32 127.0.0.1 8080 0 0 1024
Using message size of 1024 bytes
Running benchmark now...
Msg/sec: 34488.250000
Msg/sec: 34700.750000
Msg/sec: 34885.500000
Msg/sec: 34716.750000
Msg/sec: 31545.000000
Msg/sec: 34272.500000
Msg/sec: 34531.500000

# fix

./load_test 32 127.0.0.1 8080 0 0 1024
Using message size of 1024 bytes
Running benchmark now...
Msg/sec: 36548.500000
Msg/sec: 33161.750000
Msg/sec: 32867.000000
Msg/sec: 32287.500000
Msg/sec: 32432.750000
Msg/sec: 31692.500000
Msg/sec: 32399.750000
Msg/sec: 32129.500000

# 8.13.0

./load_test 32 127.0.0.1 8080 0 0
Using message size of 20 bytes
Running benchmark now...
Msg/sec: 46923.750000
Msg/sec: 42516.000000
Msg/sec: 41810.500000
Msg/sec: 42102.500000
Msg/sec: 42115.000000
Msg/sec: 42091.000000

# fix

./load_test 32 127.0.0.1 8080 0 0
Using message size of 20 bytes
Running benchmark now...
Msg/sec: 38408.500000
Msg/sec: 38996.250000
Msg/sec: 39031.250000
Msg/sec: 39019.750000
Msg/sec: 38887.500000
Msg/sec: 37871.500000
Msg/sec: 38127.000000
lpinca commented 1 year ago

It seems to be consistent with my results (~10% impact).

We can make this behavior optional by using and option that can be passed to the server and the client like the maxPayload option but

  1. I don't like it.
  2. I have no idea how to call the option. Suggestions are welcome.

Another option is an env variable like WS_NO_BUFFER_UTIL and WS_NO_UTF_8_VALIDATE but I'm also not very happy with it.

OrKoN commented 1 year ago

My opinion would be that it's better to match the spec by default and have performance optimisations as an opt-in. Also, I think the change does not have to apply to the server implementation as only the client-side events are spec'ed. I assume that clients handle fewer messages than servers so it might have lower impact on performance and would match the browser's implementations. Maybe if the change applies only to the clients, there will be no need to configure this behaviour?

lpinca commented 1 year ago

My opinion would be that it's better to match the spec by default and have performance optimisations as an opt-in.

I'm not sure I agree. The current behavior is not a bug and there are other areas where we diverge from the WHATWG spec. A client in another language with a completely different interface is still a valid client as long as it follows RFC 6455, RFC 7692, etc. We try to be compatible with the WHATWG spec for convenience.

Also, I think the change does not have to apply to the server implementation as only the client-side events are spec'ed.

The same WebSocket class is used by both the server and the client. We can make only WebSocket clients have this behavior but it does not make much sense in my opinion. It would be inconsistent and confusing.

OrKoN commented 1 year ago

I'm not sure I agree. The current behavior is not a bug and there are other areas where we diverge from the WHATWG spec. A client in another language with a completely different interface is still a valid client as long as it follows RFC 6455, RFC 7692, etc. We try to be compatible with the WHATWG spec for convenience.

That makes sense to me too.

The same WebSocket class is used by both the server and the client. We can make only WebSocket clients have this behavior but it does not make much sense in my opinion. It would be inconsistent and confusing.

Yeah, might be confusing if both client and server are used (especially within one codebase). We so far only had the need for the client.

lpinca commented 1 year ago

Yes, that is basically my point. In all these years this is the first time this is reported and as written in my first comment I don't think there is much code that depends on this behavior. This is why I would prefer to make it opt-in. On the other hand making it opt-in kind of defeats the compatibility goal as the code would not be portable. I wonder if there is some way to reduce the performance impact.

lpinca commented 1 year ago

Surprisingly this

diff --git a/lib/receiver.js b/lib/receiver.js
index c7dcf8a..f4497b4 100644
--- a/lib/receiver.js
+++ b/lib/receiver.js
@@ -165,17 +165,12 @@ class Receiver extends Writable {
           // `WAIT_MICROTASK`.
           this._loop = false;

-          const next = () => {
+          // `queueMicrotask()` is a little slower and is not available in
+          // Node.js < 11.
+          Promise.resolve().then(() => {
             this._state = GET_INFO;
             this.startLoop(cb);
-          };
-
-          if (typeof queueMicrotask === 'function') {
-            queueMicrotask(next);
-          } else {
-            // `queueMicrotask()` is not available in Node.js < 11.
-            Promise.resolve().then(next);
-          }
+          });

           return;
         }

is a little faster. I'll keep investigating.

OrKoN commented 1 year ago

Some numbers from a virtual Linux:

# main
./load_test 32 127.0.0.1 8080 0 0
Using message size of 20 bytes
Running benchmark now...
Msg/sec: 40121.250000
Msg/sec: 44508.750000
Msg/sec: 43910.750000
Msg/sec: 41611.250000
Msg/sec: 44852.750000
Msg/sec: 37529.750000
Msg/sec: 37837.500000
Msg/sec: 35302.250000
Msg/sec: 35914.750000
Msg/sec: 37568.250000

# https://github.com/websockets/ws/pull/2160
./load_test 32 127.0.0.1 8080 0 0
Using message size of 20 bytes
Running benchmark now...
Msg/sec: 42166.250000
Msg/sec: 43883.000000
Msg/sec: 39499.750000
Msg/sec: 40256.250000
Msg/sec: 44213.000000
Msg/sec: 40567.500000
Msg/sec: 36763.500000
Msg/sec: 36296.000000
Msg/sec: 38419.750000
Msg/sec: 35596.75000
OrKoN commented 1 year ago

I have quickly tried wrapping all emit calls in setImmediate to schedule them for the next cycle and it seems to be performing better (with a caveat: I am not 100% I have done it correctly and done it only for emit('message'))

lpinca commented 1 year ago

Simply deferring the 'message' event to the next iteration of the event loop does not stop the parser and this makes me a bit uncomfortable as it might have unintended consequences. Anyway, I noticed that the impact shrinks as the number of clients grows. If it is in the order of ~5%, it is acceptable.