nodejs / performance

Node.js team focusing on performance
MIT License
371 stars 7 forks source link

Node v16 vs v20/v21 regression on http perf #127

Open mcollina opened 8 months ago

mcollina commented 8 months ago

This server:

'use strict'

const server = require('http').createServer(function (req, res) {
  res.setHeader('content-type', 'application/json; charset=utf-8')
  res.end(JSON.stringify({ hello: 'world' }))
})

server.listen(3000)

gives me around 55k req/s on v21, v20, v18, while it gives me 65k req/s in v16 and v14.

I use the following to test but you should be able to get similar results with wrk too:

autocannon -c 100 -d 5 -p 10 -W [ -c 10 -d 2 -p 2 ] localhost:3000

This might be due to #79.

targos commented 8 months ago

Have you tried cpu profiling or other diagnostics?

mcollina commented 8 months ago

Not really, a bisect would be interesting anyway, but I suspect it would be a compound effect.

H4ad commented 8 months ago

A simple --cpu-prof:

node --cpu-prof test-server.js & autocannon -c 100 -d 5 -p 10 -W [ -c 10 -d 2 -p 2 ] localhost:3000
test-server.js ```js 'use strict' const data = JSON.stringify({ hello: 'world' }); const server = require('http').createServer(function (req, res) { res.setHeader('content-type', 'application/json; charset=utf-8') res.end(data) }) server.listen(3000) setTimeout(() => { server.close(); }, 8000); ```

16.20.1

image

507k requests in 5.01s, 94.6 MB read

20.8.1

image

400k requests in 5.01s, 74.5 MB read

server-on-16-20-1.cpuprofile server-on-20-8-1.cpuprofile

H4ad commented 8 months ago

I did a little exploration on writev_generic and I found some interesting things:

The method is called with the following data:

console.log(data) ```js [ { chunk: 'HTTP/1.1 200 OK\r\n' + 'content-type: application/json; charset=utf-8\r\n' + 'Date: Fri, 20 Oct 2023 01:04:24 GMT\r\n' + 'Connection: keep-alive\r\n' + 'Keep-Alive: timeout=5\r\n' + 'Content-Length: 17\r\n' + '\r\n' + '{"hello":"world"}', encoding: 'utf8', callback: [Function: nop] }, { chunk: '', encoding: 'latin1', callback: [Function: bound onFinish] }, allBuffers: false ] ``` > Later, `callback` property is not discarded before calling C++

I also tested a little app in Fastify, the result was literally the same.

In Express, the result was different:

console.log(data) ```js [ { chunk: 'HTTP/1.1 200 OK\r\n' + 'X-Powered-By: Express\r\n' + 'Content-Type: application/json; charset=utf-8\r\n' + 'Content-Length: 17\r\n' + 'ETag: W/"11-IkjuL6CqqtmReFMfkkvwC0sKj04"\r\n' + 'Date: Fri, 20 Oct 2023 01:56:39 GMT\r\n' + 'Connection: keep-alive\r\n' + 'Keep-Alive: timeout=5\r\n' + '\r\n', encoding: 'latin1', callback: [Function: nop] }, { chunk: , encoding: 'buffer', callback: [Function: nop] }, { chunk: '', encoding: 'latin1', callback: [Function: bound onFinish] }, allBuffers: false ```

Based on these payloads, I tried the following patch:

fast-path-utf8.patch ```diff commit 1cbbb5ce2cdfa53caa8ff2771150598fa8fd0099 Author: Vinícius Lourenço Date: Thu Oct 19 23:12:19 2023 -0300 lib: fast path for single utf8 chunk diff --git a/lib/internal/stream_base_commons.js b/lib/internal/stream_base_commons.js index 1555956aff..1bbf45313b 100644 --- a/lib/internal/stream_base_commons.js +++ b/lib/internal/stream_base_commons.js @@ -120,8 +120,21 @@ function createWriteWrap(handle, callback) { } function writevGeneric(self, data, cb) { + if (data.length === 2 && data[0].encoding === 'utf8' && data[1].encoding === 'latin1' && data[1].chunk === '') { + return writevGenericFastSingleUtf8(self, data, cb); + } const req = createWriteWrap(self[kHandle], cb); - const allBuffers = data.allBuffers; + const chunks = buildChunks(data, data.allBuffers); + const err = req.handle.writev(req, chunks, data.allBuffers); + + // Retain chunks + if (err === 0) req._chunks = chunks; + + afterWriteDispatched(req, err, cb); + return req; +} + +function buildChunks(data, allBuffers) { let chunks; if (allBuffers) { chunks = data; @@ -135,10 +148,16 @@ function writevGeneric(self, data, cb) { chunks[i * 2 + 1] = entry.encoding; } } - const err = req.handle.writev(req, chunks, allBuffers); + return chunks; +} + +function writevGenericFastSingleUtf8(self, data, cb) { + const req = createWriteWrap(self[kHandle], cb); + + const err = req.handle.writeUtf8String(req, data[0].chunk); // Retain chunks - if (err === 0) req._chunks = chunks; + if (err === 0) req._chunks = buildChunks(data, data.allBuffers); afterWriteDispatched(req, err, cb); return req; ```

And testing again using test-server.js, I was able to get up to 420k~450k req/s (from 400k req/s).


Although the benchmark works with the patch, the tests didn't pass, so I even tried to open a PR, I just want to raise attention to these functions because maybe we can explore some fast-path or find other optimizations by having people look at it.

Side note: @anonrig Maybe FastAPI can be used for WriteString but I'm not sure (I'm not a C++ guy), so take a look and see if you find something interesting.

mcollina commented 8 months ago

@H4ad can you paste your full benchmarks? From what you said, it seems there is no regression between Node.js v16 and v20, which is surprising. What OS and CPU architecture are you on?

H4ad commented 8 months ago

@mcollina Sorry about that, I inverted the req/ops between the versions, I updated my comment.

debadree25 commented 8 months ago

FastAPI can be used for WriteString but I'm not sure (I'm not a C++ guy), so take a look and see if you find something interesting.

In this function at lines https://github.com/nodejs/node/blob/eed33c9deaa336e76c5385a077490107c6110f08/src/stream_base.cc#L394 and https://github.com/nodejs/node/blob/eed33c9deaa336e76c5385a077490107c6110f08/src/stream_base.cc#L416 it includes CHECKs so if we are to implement fast api I guess we would see side effects (of the data being written twice ig) in case of an error maybe there could be a way to have this checks at the beginning of the method 🤔