Closed rjshrjndrn closed 4 years ago
Hi @rjshrjndrn,
Calculating number of metrics is quite easy. Here is a piece of code I use for this purpose:
-- Creates nginx_prometheus_metric_count gauge. User should call :update() on the returned metric
-- before collecting, to set up-to-date values from prometheus internals.
function metric_count(prometheus, app_name)
local metric_count = prometheus:gauge("nginx_prometheus_metric_count",
"Number of time series served by the prometheus module",
{"app"})
metric_count.update = function(self)
self:set(prometheus.key_index.last - prometheus.key_index.deleted, {app_name})
end
metric_count:set(0, {app_name})
return metric_count
end
This will return object which behaves just as regular Gauge
, but has an update
method which reads number of timeseries that are currently tracked by the prometheus module.
Calculating how much of the dictionary is actually used is bit more difficult. If you're using resty.core
, then there is a method for this, called free_space. If you're using vanilla nginx, then you're probably out of luck.
In most cases, you should notice unwanted behavior from the metric_count gauge. Just setup an alert if it exceeds reasonable amount of timeseries and you'll soon know if you are using label with unexpected cardinality.
Thanks, I'll try this out @dolik-rce .
@dolik-rce, thanks for responding here!
@rjshrjndrn, I am curious if you have seen the error counter (nginx_metric_errors_total
) incremeneted when the shared memory dict got full. While there is no easy way to determine utilization of the shared dict, that metric is designed to help users detect situations when dictionary writes start failing.
I didn't check the memory usage, but with an old version, of the library, My counter was increasing and was seeing inconsistencies with the metric data. Then I updated the library and increased the shared memory to 100 mb
Sorry, @rjshrjndrn, I probably have not made my question very clear.
Can you run a query like increase(nginx_metric_errors_total[1h])
in your Prometheus server and see if there are any non-zero values during the period of time when the nginx shared memory was full?
@knyar I have values > 1 but the problem is I don't know whether it's indicating shared memory full. But for that time period, I had metrics discrepancies.
Thanks for confirming!
I'd recommend configuring an alert on that metric being > 0. When it fires you should usually be able to understand what's wrong looking at nginx error logs.
Hi,
Thank you for this awesome plugin. Recently I added nginx caching staus (HIT or MISS) to the nginx_http_reqests_total, and saw that there are inconsistencies in the metrics, and upon further investigation, found that nginx shared memory was complete. Now I update that to 100M and seems working fine. So is there any way to expose the current usage, or any calculation for how many metrics can be stored/MB ? Regards.