Open mcollina opened 8 months ago
Have you tried cpu profiling or other diagnostics?
Not really, a bisect would be interesting anyway, but I suspect it would be a compound effect.
A simple --cpu-prof:
node --cpu-prof test-server.js & autocannon -c 100 -d 5 -p 10 -W [ -c 10 -d 2 -p 2 ] localhost:3000
507k requests in 5.01s, 94.6 MB read
400k requests in 5.01s, 74.5 MB read
I did a little exploration on writev_generic
and I found some interesting things:
The method is called with the following data
:
I also tested a little app in Fastify
, the result was literally the same.
In Express
, the result was different:
Based on these payloads, I tried the following patch:
And testing again using test-server.js
, I was able to get up to 420k~450k req/s (from 400k req/s).
Although the benchmark works with the patch, the tests didn't pass, so I even tried to open a PR, I just want to raise attention to these functions because maybe we can explore some fast-path or find other optimizations by having people look at it.
Side note: @anonrig Maybe
FastAPI
can be used for WriteString but I'm not sure (I'm not a C++ guy), so take a look and see if you find something interesting.
@H4ad can you paste your full benchmarks? From what you said, it seems there is no regression between Node.js v16 and v20, which is surprising. What OS and CPU architecture are you on?
@mcollina Sorry about that, I inverted the req/ops between the versions, I updated my comment.
FastAPI can be used for WriteString but I'm not sure (I'm not a C++ guy), so take a look and see if you find something interesting.
In this function at lines https://github.com/nodejs/node/blob/eed33c9deaa336e76c5385a077490107c6110f08/src/stream_base.cc#L394 and https://github.com/nodejs/node/blob/eed33c9deaa336e76c5385a077490107c6110f08/src/stream_base.cc#L416 it includes CHECKs so if we are to implement fast api I guess we would see side effects (of the data being written twice ig) in case of an error maybe there could be a way to have this checks at the beginning of the method 🤔
This server:
gives me around 55k req/s on v21, v20, v18, while it gives me 65k req/s in v16 and v14.
I use the following to test but you should be able to get similar results with
wrk
too:This might be due to #79.