Closed moshen closed 8 years ago
Just realized that although the broadcast benchmark runs successfully, I don't think it's doing the right thing. Going to double check this.
don't we need sticky session with web socket in node?
something like https://github.com/uqee/sticky-cluster
No, not for persistent connections (like websockets). However, the "broadcast" test wouldn't be accurate because each server process isn't broadcasting to the clients connected to other processes. This requires a little bit of IPC to make work, ie: we have to tell each other worker process to broadcast a message to all it's connected clients.
Another note, this kind of implementation issue could be easily detected by the client as noted in #16
This should now broadcast to all clients connected to all processes.
I'm having some trouble getting the cluster version to work.
dev@earth:~/hashrocket/websocket-shootout/js(moshen-add-nodejs-clustering*)% node -v {1}
v5.12.0
dev@earth:~/hashrocket/websocket-shootout/js(moshen-add-nodejs-clustering*)% node run-cluster.js uws
/home/dev/hashrocket/websocket-shootout/js/run-cluster.js:26
let onExit = () => {
^^^
SyntaxError: Block-scoped declarations (let, const, function, class) not yet supported outside strict mode
at exports.runInThisContext (vm.js:53:16)
at Module._compile (module.js:387:25)
at Object.Module._extensions..js (module.js:422:10)
at Module.load (module.js:357:32)
at Function.Module._load (module.js:314:12)
at Function.Module.runMain (module.js:447:10)
at startup (node.js:148:18)
at node.js:405:3
If I add --use-strict
I get a different error when hitting the server.
dev@earth:~/hashrocket/websocket-shootout/js(moshen-add-nodejs-clustering*)% node --use-strict run-cluster.js uws {1}
internal/child_process.js:536
throw new TypeError('message cannot be undefined');
^
TypeError: message cannot be undefined
at ChildProcess.target._send (internal/child_process.js:536:13)
at ChildProcess.target.send (internal/child_process.js:521:19)
at Worker.__dirname.Worker.options.process.on.Worker.send (cluster.js:51:21)
at /home/dev/hashrocket/websocket-shootout/js/run-cluster.js:45:29
at Array.forEach (native)
at EventEmitter.<anonymous> (/home/dev/hashrocket/websocket-shootout/js/run-cluster.js:43:34)
at emitTwo (events.js:100:13)
at EventEmitter.emit (events.js:185:7)
at Worker.<anonymous> (cluster.js:344:15)
at emitTwo (events.js:100:13)
node 6 should work.
@jackc , @Haofei is right. This was written using Node 6+ (ES6 syntax for some stuff). It's not really necessary and could probably be rewritten to use Node 5 if uWebsockets supports it (looks like it does).
Tried it using Node 6 and got it to run. Probably want to change where the README.md
says Node.JS 4,5, or 6+
is required.
The good news is it is really fast. The bad news is it seems to be doing something incorrect. The latest version of the benchmark counts the received broadcasts and checks against the expected amount. This implementation seems to send more broadcasts than it should:
dev@mercury:~/hashrocket/websocket-shootout(master+*)% bin/websocket-bench broadcast ws://192.168.50.12:3334/ws -l 192.168.50.5 -l 192.168.50.246 -l 192.168.50.247 -c 4 -s 40 --step-size 1000 -w neptune:4000 -w uranus:4000 {1}
clients: 1000 95per-rtt: 12ms min-rtt: 2ms median-rtt: 5ms max-rtt: 13ms
clients: 2000 95per-rtt: 153ms min-rtt: 2ms median-rtt: 7ms max-rtt: 154ms
clients: 3000 95per-rtt: 78ms min-rtt: 4ms median-rtt: 8ms max-rtt: 193ms
clients: 4000 95per-rtt: 159ms min-rtt: 2ms median-rtt: 10ms max-rtt: 212ms
clients: 5000 95per-rtt: 159ms min-rtt: 3ms median-rtt: 17ms max-rtt: 201ms
clients: 6000 95per-rtt: 170ms min-rtt: 4ms median-rtt: 14ms max-rtt: 202ms
clients: 7000 95per-rtt: 216ms min-rtt: 4ms median-rtt: 16ms max-rtt: 413ms
clients: 8000 95per-rtt: 166ms min-rtt: 4ms median-rtt: 27ms max-rtt: 219ms
clients: 9000 95per-rtt: 204ms min-rtt: 6ms median-rtt: 21ms max-rtt: 204ms
clients: 10000 95per-rtt: 269ms min-rtt: 5ms median-rtt: 27ms max-rtt: 640ms
clients: 11000 95per-rtt: 183ms min-rtt: 6ms median-rtt: 32ms max-rtt: 212ms
clients: 12000 95per-rtt: 230ms min-rtt: 6ms median-rtt: 30ms max-rtt: 240ms
clients: 13000 95per-rtt: 210ms min-rtt: 6ms median-rtt: 29ms max-rtt: 357ms
clients: 14000 95per-rtt: 229ms min-rtt: 7ms median-rtt: 35ms max-rtt: 262ms
clients: 15000 95per-rtt: 204ms min-rtt: 8ms median-rtt: 54ms max-rtt: 223ms
clients: 16000 95per-rtt: 258ms min-rtt: 8ms median-rtt: 34ms max-rtt: 282ms
clients: 17000 95per-rtt: 269ms min-rtt: 9ms median-rtt: 59ms max-rtt: 410ms
clients: 18000 95per-rtt: 306ms min-rtt: 8ms median-rtt: 42ms max-rtt: 314ms
clients: 19000 95per-rtt: 224ms min-rtt: 8ms median-rtt: 60ms max-rtt: 230ms
clients: 20000 95per-rtt: 211ms min-rtt: 8ms median-rtt: 77ms max-rtt: 224ms
clients: 21000 95per-rtt: 288ms min-rtt: 8ms median-rtt: 49ms max-rtt: 360ms
clients: 22000 95per-rtt: 346ms min-rtt: 9ms median-rtt: 65ms max-rtt: 580ms
clients: 23000 95per-rtt: 299ms min-rtt: 9ms median-rtt: 67ms max-rtt: 406ms
clients: 24000 95per-rtt: 371ms min-rtt: 9ms median-rtt: 46ms max-rtt: 817ms
clients: 25000 95per-rtt: 231ms min-rtt: 9ms median-rtt: 68ms max-rtt: 440ms
clients: 26000 95per-rtt: 281ms min-rtt: 10ms median-rtt: 65ms max-rtt: 415ms
clients: 27000 95per-rtt: 283ms min-rtt: 12ms median-rtt: 90ms max-rtt: 377ms
clients: 28000 95per-rtt: 219ms min-rtt: 10ms median-rtt: 110ms max-rtt: 320ms
clients: 29000 95per-rtt: 314ms min-rtt: 10ms median-rtt: 71ms max-rtt: 341ms
clients: 30000 95per-rtt: 261ms min-rtt: 11ms median-rtt: 84ms max-rtt: 394ms
clients: 31000 95per-rtt: 300ms min-rtt: 11ms median-rtt: 101ms max-rtt: 452ms
clients: 32000 95per-rtt: 292ms min-rtt: 11ms median-rtt: 123ms max-rtt: 317ms
clients: 33000 95per-rtt: 268ms min-rtt: 12ms median-rtt: 116ms max-rtt: 272ms
clients: 34000 95per-rtt: 456ms min-rtt: 11ms median-rtt: 135ms max-rtt: 543ms
2016/09/21 08:41:00 Extra received broadcasts: expected 25200000, got 28350000
I ran it several times and can repeatedly get extra results. Also, it's not always an even number received, but is usually is (sometimes get something like: 2016/09/21 08:49:53 Extra received broadcasts: expected 15120000, got 17003916
)
@jackc : oops on the README. Took a couple of passes at this.
I think I know what's happening in the broadcast test.
(17003916 - 15120000)/15120000 = 0.12459761904762
(28350000 - 25200000)/25200000 = 0.125
Core i7 4790K has 4 cores * 2 for hyper-threading.
Guard if block probably isn't doing what I think it should for clustering, we're still sending an extra broadcast for each of the connected clients for one of the processes.
I'll push a fix later today.
@moshen You were right about the extra broadcast. The worker that received the broadcast request would broadcast to its connected client and send the message to the process handler. But the original worker would receive its own message back from the process system. Solution was to only broadcast in response to the process handler.
Fixed in: be69ac6749e67264834debd28dfc867e3e245c8b
The performance change is interesting. Cluster mode is slightly faster for uws, but slightly slower for ws.
@jackc , I fixed the other issues at the tip of this branch, but since you've merged from the older revision I can push further tweaks in another PR.
WRT the performance differences, I'm not exactly sure what you mean by faster/slower (response time?) ws looks like it recommends an inefficient way of broadcasting when compared to uws (iterating over clients in C++-land and sending the same bytes from memory to each). ws is checking and converting the "data" to a Buffer on each iteration, at first glace this looks pretty bad, and we should probably pass a Buffer in to start with.
The run-cluster.js
implementation could probably also benefit from a way of specifying exactly how many workers to start, this could allow for multiple passes with differing numbers of workers.
What I meant by faster/slow was number of clients at 500ms / 95 percentile. ws did worse with clustering while uws did better.
Also cleans up the node project a little by combining dependencies in the root folder. Makes it a little easier to setup and run the benchmark