siimon / prom-client

Prometheus client for node.js
Apache License 2.0
3.15k stars 377 forks source link

Performance issue with open file descriptors #615

Open constantind opened 9 months ago

constantind commented 9 months ago

High cpu usage and from blocking file IO

Steps to reproduce

add default nodejs metrics create 100 connections Collect cpu profile and check this appears on top: https://github.com/siimon/prom-client/blob/master/lib/metrics/processOpenFileDescriptors.js#L25

Version 15.1.0 OS: RedHat Linux 8 Node: 18.18.2

SimenB commented 9 months ago

create 100 connections

What does this mean? 100 concurrent http request? Could you put together a reproduction with a script that does this?

constantind commented 9 months ago

Yes, but could also be more than 50. By connection i mean connection (onConnectionInternal) which happens to be a fd on linux in terms of nodejs and incoming after onParseHeadersComplete in nodejs is an actual request in terms of express Looks to me as something is wrong with the nodejs cjs loader and the readdirSync is file i/o blocking the main application while /metrics is running and extremely slow inside a container (another scenario we had to offload to worker thread). Perhaps provide option to disable file handles that was added in 15.1.0? Here reproduction which is pretty standard use:

server.js ` 'use strict';

const express = require('express'); //latest 4.x const server = express(); const { promMetrics } = require('./metrics');

promMetrics(server);

server.get('/status', (req, res) => res.send('{"status":"ok"}');

const port = process.env.PORT || 3000; console.log( Server listening to ${port}, metrics exposed on /metrics endpoint, ); server.listen(port); `


metrics.js ` 'use strict'; const Prometheus = require('prom-client'); const collectDefaultMetrics = Prometheus.collectDefaultMetrics; collectDefaultMetrics();

const promMetrics = (router) => { router.get("/metrics", async (req, res) => { res.set('Content-Type', Prometheus.register.contentType); res.end(await Prometheus.register.metrics()); }); }; exports.promMetrics = promMetrics; `

now create cpu profiler or flame graph for example start with node --prof --inspect server.js you can attach chrome://inspect and use performance tab or run node --prof-process isolatexxxx > output.txt and see that is on top time taken send the load with wrk, autocannon, or similar for example using artillery.io npm install -g artillery create test.yml, the user in below case is socket connection

` config: target: "http://localhost:3000" http: pool: 50 timeout: 60 phases:

`

artillery run test.yml at the same time call /metrics and observe