Fix #15 add clustering for nodejs impls

moshen commented 8 years ago

Also cleans up the node project a little by combining dependencies in the root folder. Makes it a little easier to setup and run the benchmark

moshen commented 8 years ago

Just realized that although the broadcast benchmark runs successfully, I don't think it's doing the right thing. Going to double check this.

Haofei commented 8 years ago

don't we need sticky session with web socket in node?

something like https://github.com/uqee/sticky-cluster

moshen commented 8 years ago

No, not for persistent connections (like websockets). However, the "broadcast" test wouldn't be accurate because each server process isn't broadcasting to the clients connected to other processes. This requires a little bit of IPC to make work, ie: we have to tell each other worker process to broadcast a message to all it's connected clients.

moshen commented 8 years ago

Another note, this kind of implementation issue could be easily detected by the client as noted in #16

moshen commented 8 years ago

This should now broadcast to all clients connected to all processes.

jackc commented 8 years ago

I'm having some trouble getting the cluster version to work.

dev@earth:~/hashrocket/websocket-shootout/js(moshen-add-nodejs-clustering*)% node -v                                                                                                                                                       {1}
v5.12.0

dev@earth:~/hashrocket/websocket-shootout/js(moshen-add-nodejs-clustering*)% node run-cluster.js uws
/home/dev/hashrocket/websocket-shootout/js/run-cluster.js:26
  let onExit = () => {
  ^^^

SyntaxError: Block-scoped declarations (let, const, function, class) not yet supported outside strict mode
    at exports.runInThisContext (vm.js:53:16)
    at Module._compile (module.js:387:25)
    at Object.Module._extensions..js (module.js:422:10)
    at Module.load (module.js:357:32)
    at Function.Module._load (module.js:314:12)
    at Function.Module.runMain (module.js:447:10)
    at startup (node.js:148:18)
    at node.js:405:3

If I add --use-strict I get a different error when hitting the server.

dev@earth:~/hashrocket/websocket-shootout/js(moshen-add-nodejs-clustering*)% node --use-strict run-cluster.js uws                                                                                                                          {1}
internal/child_process.js:536
      throw new TypeError('message cannot be undefined');
      ^

TypeError: message cannot be undefined
    at ChildProcess.target._send (internal/child_process.js:536:13)
    at ChildProcess.target.send (internal/child_process.js:521:19)
    at Worker.__dirname.Worker.options.process.on.Worker.send (cluster.js:51:21)
    at /home/dev/hashrocket/websocket-shootout/js/run-cluster.js:45:29
    at Array.forEach (native)
    at EventEmitter.<anonymous> (/home/dev/hashrocket/websocket-shootout/js/run-cluster.js:43:34)
    at emitTwo (events.js:100:13)
    at EventEmitter.emit (events.js:185:7)
    at Worker.<anonymous> (cluster.js:344:15)
    at emitTwo (events.js:100:13)

Haofei commented 8 years ago

node 6 should work.

moshen commented 8 years ago

@jackc , @Haofei is right. This was written using Node 6+ (ES6 syntax for some stuff). It's not really necessary and could probably be rewritten to use Node 5 if uWebsockets supports it (looks like it does).

moshen commented 8 years ago

Also, FYI looks like Node v5 is unsupported (or will be soon)

jackc commented 8 years ago

Tried it using Node 6 and got it to run. Probably want to change where the README.md says Node.JS 4,5, or 6+ is required.

The good news is it is really fast. The bad news is it seems to be doing something incorrect. The latest version of the benchmark counts the received broadcasts and checks against the expected amount. This implementation seems to send more broadcasts than it should:

dev@mercury:~/hashrocket/websocket-shootout(master+*)% bin/websocket-bench broadcast ws://192.168.50.12:3334/ws -l 192.168.50.5 -l 192.168.50.246 -l 192.168.50.247 -c 4 -s 40 --step-size 1000 -w neptune:4000 -w uranus:4000             {1}
clients:  1000    95per-rtt:  12ms    min-rtt:   2ms    median-rtt:   5ms    max-rtt:  13ms
clients:  2000    95per-rtt: 153ms    min-rtt:   2ms    median-rtt:   7ms    max-rtt: 154ms
clients:  3000    95per-rtt:  78ms    min-rtt:   4ms    median-rtt:   8ms    max-rtt: 193ms
clients:  4000    95per-rtt: 159ms    min-rtt:   2ms    median-rtt:  10ms    max-rtt: 212ms
clients:  5000    95per-rtt: 159ms    min-rtt:   3ms    median-rtt:  17ms    max-rtt: 201ms
clients:  6000    95per-rtt: 170ms    min-rtt:   4ms    median-rtt:  14ms    max-rtt: 202ms
clients:  7000    95per-rtt: 216ms    min-rtt:   4ms    median-rtt:  16ms    max-rtt: 413ms
clients:  8000    95per-rtt: 166ms    min-rtt:   4ms    median-rtt:  27ms    max-rtt: 219ms
clients:  9000    95per-rtt: 204ms    min-rtt:   6ms    median-rtt:  21ms    max-rtt: 204ms
clients: 10000    95per-rtt: 269ms    min-rtt:   5ms    median-rtt:  27ms    max-rtt: 640ms
clients: 11000    95per-rtt: 183ms    min-rtt:   6ms    median-rtt:  32ms    max-rtt: 212ms
clients: 12000    95per-rtt: 230ms    min-rtt:   6ms    median-rtt:  30ms    max-rtt: 240ms
clients: 13000    95per-rtt: 210ms    min-rtt:   6ms    median-rtt:  29ms    max-rtt: 357ms
clients: 14000    95per-rtt: 229ms    min-rtt:   7ms    median-rtt:  35ms    max-rtt: 262ms
clients: 15000    95per-rtt: 204ms    min-rtt:   8ms    median-rtt:  54ms    max-rtt: 223ms
clients: 16000    95per-rtt: 258ms    min-rtt:   8ms    median-rtt:  34ms    max-rtt: 282ms
clients: 17000    95per-rtt: 269ms    min-rtt:   9ms    median-rtt:  59ms    max-rtt: 410ms
clients: 18000    95per-rtt: 306ms    min-rtt:   8ms    median-rtt:  42ms    max-rtt: 314ms
clients: 19000    95per-rtt: 224ms    min-rtt:   8ms    median-rtt:  60ms    max-rtt: 230ms
clients: 20000    95per-rtt: 211ms    min-rtt:   8ms    median-rtt:  77ms    max-rtt: 224ms
clients: 21000    95per-rtt: 288ms    min-rtt:   8ms    median-rtt:  49ms    max-rtt: 360ms
clients: 22000    95per-rtt: 346ms    min-rtt:   9ms    median-rtt:  65ms    max-rtt: 580ms
clients: 23000    95per-rtt: 299ms    min-rtt:   9ms    median-rtt:  67ms    max-rtt: 406ms
clients: 24000    95per-rtt: 371ms    min-rtt:   9ms    median-rtt:  46ms    max-rtt: 817ms
clients: 25000    95per-rtt: 231ms    min-rtt:   9ms    median-rtt:  68ms    max-rtt: 440ms
clients: 26000    95per-rtt: 281ms    min-rtt:  10ms    median-rtt:  65ms    max-rtt: 415ms
clients: 27000    95per-rtt: 283ms    min-rtt:  12ms    median-rtt:  90ms    max-rtt: 377ms
clients: 28000    95per-rtt: 219ms    min-rtt:  10ms    median-rtt: 110ms    max-rtt: 320ms
clients: 29000    95per-rtt: 314ms    min-rtt:  10ms    median-rtt:  71ms    max-rtt: 341ms
clients: 30000    95per-rtt: 261ms    min-rtt:  11ms    median-rtt:  84ms    max-rtt: 394ms
clients: 31000    95per-rtt: 300ms    min-rtt:  11ms    median-rtt: 101ms    max-rtt: 452ms
clients: 32000    95per-rtt: 292ms    min-rtt:  11ms    median-rtt: 123ms    max-rtt: 317ms
clients: 33000    95per-rtt: 268ms    min-rtt:  12ms    median-rtt: 116ms    max-rtt: 272ms
clients: 34000    95per-rtt: 456ms    min-rtt:  11ms    median-rtt: 135ms    max-rtt: 543ms
2016/09/21 08:41:00 Extra received broadcasts: expected 25200000, got 28350000

I ran it several times and can repeatedly get extra results. Also, it's not always an even number received, but is usually is (sometimes get something like: 2016/09/21 08:49:53 Extra received broadcasts: expected 15120000, got 17003916)

moshen commented 8 years ago

@jackc : oops on the README. Took a couple of passes at this.

I think I know what's happening in the broadcast test.

(17003916 - 15120000)/15120000 = 0.12459761904762
(28350000 - 25200000)/25200000 = 0.125

Core i7 4790K has 4 cores * 2 for hyper-threading.

Guard if block probably isn't doing what I think it should for clustering, we're still sending an extra broadcast for each of the connected clients for one of the processes.

I'll push a fix later today.

jackc commented 8 years ago

@moshen You were right about the extra broadcast. The worker that received the broadcast request would broadcast to its connected client and send the message to the process handler. But the original worker would receive its own message back from the process system. Solution was to only broadcast in response to the process handler.

Fixed in: be69ac6749e67264834debd28dfc867e3e245c8b

The performance change is interesting. Cluster mode is slightly faster for uws, but slightly slower for ws.

moshen commented 8 years ago

@jackc , I fixed the other issues at the tip of this branch, but since you've merged from the older revision I can push further tweaks in another PR.

WRT the performance differences, I'm not exactly sure what you mean by faster/slower (response time?) ws looks like it recommends an inefficient way of broadcasting when compared to uws (iterating over clients in C++-land and sending the same bytes from memory to each). ws is checking and converting the "data" to a Buffer on each iteration, at first glace this looks pretty bad, and we should probably pass a Buffer in to start with.

The run-cluster.js implementation could probably also benefit from a way of specifying exactly how many workers to start, this could allow for multiple passes with differing numbers of workers.

jackc commented 8 years ago

What I meant by faster/slow was number of clients at 500ms / 95 percentile. ws did worse with clustering while uws did better.

hashrocket / websocket-shootout

Fix #15 add clustering for nodejs impls #31