siimon / prom-client

Prometheus client for node.js
Apache License 2.0
3.09k stars 372 forks source link

Impossible to reset the metrics in cluster mode #575

Open oliviermarlec opened 1 year ago

oliviermarlec commented 1 year ago

I implemented a counter metric in a Node JS http server. I have one master and eight workers so I use prom-client with the cluster module using the AggregatorRegistry class:

const { AggregatorRegistry } = require('prom-client');
const aggregatorRegistry = new AggregatorRegistry();

The collection of all metrics works properly with the following function I implemented:

async function sendMetrics(request, response) {
    try {
        const metrics = await aggregatorRegistry.clusterMetrics();
        response.writeHead(200, { 'Content-Type': aggregatorRegistry.contentType });
        response.end(metrics);
    } catch (ex) {
        response.writeHead(500, { 'Content-Type': aggregatorRegistry.contentType });
        response.end(ex.message);
    }
}

But the reset of the metrics doesn't work, here is function I defined for that:

async function flushMetrics(request, response) {
    try {
        await aggregatorRegistry.resetMetrics();
        response.writeHead(200, { 'Content-Type': aggregatorRegistry.contentType });
        response.end('OK\n');
    } catch (ex) {
        response.writeHead(500, { 'Content-Type': aggregatorRegistry.contentType });
        response.end(ex.message);
    }
}

After the call of flushMetrics() the metrics are unchanged. Any idea ?

zbjornson commented 9 months ago

That's indeed not supported. The maintainers of Prometheus said that there are very few scenarios when users should use reset(), though: see #179, #236, #402, ...

Given that info, do you actually need support for this?

jedlikowski commented 1 month ago

Hey, I've encountered a similar case, where I do need to reset a couple of metrics periodically. Our metrics have a very high cardinality and unfortunately need to stay that way. This causes a constantly increasing memory usage during application lifespan, hence the idea to reset those few metrics after every scrape and handle them accordingly on the visualisation side.

For now, we created a custom MetricsAggregator class which is basically a copy-paste of AggregatorRegistry but with ability to pass an array of metric names to clusterMetrics() method. Those metrics are then reset in workers right after they are collected and prepared to send back to master process.

Would you be open to a PR which would upstream those changes?