jochen-schweizer / express-prom-bundle

express middleware with standard prometheus metrics in one bundle
MIT License
303 stars 67 forks source link

Option to erase series without requests for a given time #116

Open rilpires opened 1 year ago

rilpires commented 1 year ago

I think it would be useful if labels wouldn't persist indefinitely. If I have a set of {method,path,status_code} that has not been active for a given time, I would like to erase it from the prom-client register, therefore, not exposing it to prometheus...

disjunction commented 1 year ago

wouldn't it be a feature of prom-client then? You may want to remove unused labels from other metrics too, not only http_request_duration_seconds

rilpires commented 1 year ago

From Gauge/Counter/Histogram/Summary classes, I can't easily know which label set has been active or not, how many entries I had, etc. I would need some kind of "labelset-to-timestamp" map to register it right before I "observe" a new value, and this is made inside this module, not prom-client ...

Optional parameters could be how long should it persist and a interval to evaluate this "garbage collector" callback

My case is that I have a huge API made with express, it is hard to keep track of all possible combinations of , there will be always some combinations that happens every once in a while and they will persist forever... on every per-client instance...

disjunction commented 1 year ago

if you check the code, you'd see it's a very dumb proxy to prom-client. This library has almost zero functionality and of course it doesn't keep any information about your endpoints whatsoever. So this kind of functionality would be in any case out of place here.

On the other hand believe me you're not the only one having a huge API that needs to be monitored. My own solution to this is "path normalization", i.e. before a specific path gets counted it is first transformed into a generic form. E.g. /persons/234234/profile becomes /persons/#val/profile. And thus you will have a limited number of combination.

See:

normalizePath: function(req) or Array - if function is provided, then it should generate path value from express req

If the default behavior doesn't work, you may write your own normalizePath in such a way that it's not too much data but also enough for you to identify problematic areas. E.g. instead of conversion above you may turn all /person/* paths into simply /person and this way you will know how many requests are about persons in total.

Hope it helps!

rilpires commented 1 year ago

Well, thanks for the tip about the path normalization. I'm already using it as it is critically necessary.

About the expiration of unused <path,method,status_code>, I convinced myself that this could be resolved from prom-client. I've changed type from histogram to summary in order to try the "maxAgeSeconds" parameter, but sadly after expiring the observations for a given label set, it is shown as 0 instead of pruning itself.

There is already an issue for it: siimon/prom-client#492

timm-stelzer-e-farm commented 1 year ago

With the prom-client fix, and this patch:

diff --git a/src/index.js b/src/index.js
index 0c96dfb9ac0c9e3559b69e83ff60d09f4dcbbd23..8a0aa77516baf65efb8c7f2d3a2b757ad6abcf82 100644
--- a/src/index.js
+++ b/src/index.js
@@ -112,7 +112,8 @@ function main(opts) {
         percentiles: opts.percentiles || [0.5, 0.75, 0.95, 0.98, 0.99, 0.999],
         maxAgeSeconds:  opts.maxAgeSeconds,
         ageBuckets: opts.ageBuckets,
-        registers: [opts.promRegistry]
+        registers: [opts.promRegistry],
+        pruneAgedBuckets: true,
       });
     } else if (opts.metricType === 'histogram' || !opts.metricType) {
       return new promClient.Histogram({%      

Seems to work as expected.

timm-stelzer-e-farm commented 1 year ago

FWIW: https://github.com/jochen-schweizer/express-prom-bundle/pull/119