jochen-schweizer / express-prom-bundle

express middleware with standard prometheus metrics in one bundle
MIT License
310 stars 68 forks source link

Missing support for node cluster #16

Closed swimmadude66 closed 6 years ago

swimmadude66 commented 6 years ago

When running a node API with cluster, the bundle will only report metrics for the worker which served the specific /metrics request.

prom-client addressed this in prom-client#82. It seems this middleware should have another config option which indicates usage in a clustered environment, and reports aggregated metrics instead.

disjunction commented 6 years ago

(let me say upfront, I've not dealt with cluster together with express)

I don't see the issue addressed in siimon/prom-client#147. What they suggest is using an additional aggregator, see https://github.com/siimon/prom-client/blob/master/example/cluster.js

You can use the same approach:

const promBundle = express('express-prom-bundle');
const AggregatorRegistry = promBundle.promClient.AggregatorRegistry;
const aggregatorRegistry = new AggregatorRegistry();
// ...

You need 2 different kinds of express running in parallel: one usual running in each of the forked processes, and one additional running on a different port with only purpose - to serve the metrics.

What might work is adding a new function to express-prom-bundle like clusterMetrics(), which would create the aggregator and return a middleware bound to that aggregator. In your code you'd have something like:

const pb = require('express-prom-bundle')({
  autoregister: false
});

// ...

if (cluster.isMaster) {
  // ... 
  const metricsServer = express();
  metricsServer.get('/metrics', pb.clusterMetrics());
  metricsServer.listen(3001);
}

does this sound like a plan?

swimmadude66 commented 6 years ago

That's pretty close to what I was thinking. I've opened a PR which uses a flag to expose cluster metrics on master.

I am not terribly familiar with prom-client, so big unknown on both of these approaches is whether or not the registries in each worker will communicate correctly.

disjunction commented 6 years ago

@swimmadude66 could you test it? My guess, if it were so easy the prom-client guys wouldn't have made such a twisted example ;)

Regarding PR - before I accept it, i'll have to make sure: 0) it works 1) linting is fine 2) the new code is 100% covered in unit tests 3) console.error is used to report errors ;) 4) readme has the explanation of the new feature

I'll get to it earliest tomorrow evening. Feel free to proceed with those points

swimmadude66 commented 6 years ago

I am running it currently and it appears to work, but will look in to adding unit tests tomorrow morning. I tried to match your code style, so let me know if you see anything funky. My PR also has the example given for using it in the readme

disjunction commented 6 years ago

@swimmadude66 I've adjusted the docs, extended the unit test, but essentially it's the same as in your PR.

It's now published under version 4.2.0

Thank you for raising and solving this issue!