slanatech / swagger-stats

API Observability. Trace API calls and Monitor API performance, health and usage statistics in Node.js Microservices.
https://swaggerstats.io/
MIT License
891 stars 136 forks source link

Issue with fetching request metrics #136

Open shubhendumadhukar opened 3 years ago

shubhendumadhukar commented 3 years ago

I am using swagger-stats with one of my express applications. And the result is amazing (most of the times). Great utility!

I am however, facing an intermittent issue, where the stats return 0 for all request related metrics (example attached). This causes all graphs on the UI to render without any data.

I am running my node process in 1 master and n workers configuration using node's cluster module. And I am using following swagger-stats specific configurations

app.use(
  swStats.getMiddleware({
    name: "express-app",
    uriPath: "/monitoring",
  })
);
app.get("/stats", function (req, res) {
  res.setHeader("Content-Type", "application/json");
  res.send(swStats.getCoreStats());
});

I have attached sample response I get when I see blank graphs swagger-stats.txt

Is there something I can do resolve the intermittent issue I am facing? I can provide additional logs/data points if required.

sv2 commented 3 years ago

Could you elaborate on your node cluster configuration ? The issue could be that requests are handled in workers and thus not visible to swagger-stats in master. There are ways to aggregate prometheus metrics when used with PM2, but not stats (https://swaggerstats.io/guide/prometheus.html#scraping-with-multiple-pm2-processes)

shubhendumadhukar commented 3 years ago

That is correct, my requests are being handled in workers. Here's the configuration,

That was my first thought that master might not have access to those metrics. But I do get correct aggregated metrics at times even when the app is running with multiple workers(I haven't used an aggregator registry yet as prom-client suggests).

shubhendumadhukar commented 3 years ago

Update:

I ran another test with higher load and increased number of workers. And my earlier assumption was wrong. The metrics aren't being reported as "zero" intermittently. Rather the dashboard shows the metrics for one worker at a time, which is understandable.

Having said that, I would like to modify my question to: Is there a way I can add a custom tag to these stats being generated which can help me identify which worker's data is displayed (maybe a process id filter)? I know I have access to the promClient instance, but can I modify it to add custom labels?