knyar / nginx-lua-prometheus

Prometheus metric library for Nginx written in Lua
MIT License
1.44k stars 230 forks source link

feature requirement: new option expiration for recycle the share dict #164

Open Sn0rt opened 9 months ago

Sn0rt commented 9 months ago

I found that the implementation of the prometheus library requires manual recycling of the metric if the metric is no longer updated.

I think if nginx-lua-prometheus supports the new option expiration, it means that this metric will be recycled if it is not updated within the specified time.

init_worker_by_lua_block {
   prometheus = require("prometheus").init("prometheus_metrics", {exptime = 360})
}

and exptime default is 0

The optional exptime argument specifies expiration time (in seconds) for the inserted key-value pair. The time resolution is 0.001 seconds. If the exptime takes the value 0 (which is the default), then the item will never expire.

I think it is enough to simply implement it based on share dict.

knyar commented 9 months ago

I found that the implementation of the prometheus library requires manual recycling of the metric if the metric is no longer updated.

I believe that's pretty standard for the Prometheus ecosystem. Once created, metrics exist until the process ends.

It's totally reasonable for a metric to not get updated for a long time. For example, a counter that tracks the number of errors is hopefully not incremented very often. But in most cases it would be unhelpful to delete it after it has been incremented.

I think if nginx-lua-prometheus supports the new option expiration, it means that this metric will be recycled if it is not updated within the specified time.

Does it need to be part of the nginx-lua-prometheus library? We provide del and reset methods for counters and gauges that I believe should make it possible to implement time-based deletion of metrics if it is necessary in your environment.

Sn0rt commented 9 months ago

In the openresty ecosystem, metrics are usually recorded in the log phase. However, there is no appropriate phase to manually destroy metric records.

If you use TTL to destroy, it is more in line with openresty's practice, instead of using more complex logic to maintain the share dict required by prometheus.

I can implement this feature if you agree with this requirement.

knyar commented 9 months ago

There are a lot of moving pieces we have which were mostly added for performance reasons (per-worker counters that get regularly flushed to the main dict, separate keys in the dict for metric names), and integrating time-based expiry correctly will not be trivial and will likely further increase complexity of the library.

Also, I believe as a use case this is sufficiently unusual and niche that I would be hesitant to support it natively. I am not sure many other Prometheus metric libraries have functionality like this, and a quick search for "TTL" in Prometheus mailing list suggest that every time this has been discussed, there was a more idiomatic way to achieve whatever was needed. I am not trying to tell you that what you are trying to do is wrong, but I would suggest trying to find a solution without making changes to the library.

I've not though too much about it, but perhaps you might be able to do it by keeping track of metric usage in another dict and calling del or reset from a timer?

Sn0rt commented 8 months ago

I think since this library is in the nginx ecosystem, it is more practical to better adapt to the nginx ecosystem.

Simply using timer implementation may be more complicated than implementing ttl in the library.

  1. In the nginx ecosystem, there is indeed no appropriate time to call del and reset.
  2. Moreover, ttl can be a default configuration option or not enabled by default. The workload is very small. I wrote a POC https://github.com/Sn0rt/nginx-lua-prometheus/pull/1
knyar commented 8 months ago

In the nginx ecosystem, there is indeed no appropriate time to call del and reset.

I am not sure I understand how it is different from any other environment?

Moreover, ttl can be a default configuration option or not enabled by default. The workload is very small. I wrote a POC

This approach will result in only metric values being removed, with all other data staying behind in an inconsistent state.

There is a reason why del and reset are more complex than just removing a single key from the shared dict. Over time, lots of optimizations have been added to the library to support high-throughput use cases: separate keyspace in the dict for metric names (prometheus_keys.lua), per-worker eventually consistent counters, per-worker cache of full metric names. All of that increased the complexity of the code substantially, and will make it difficult to correctly implement time-based metric expiry, especially if it's based on exptime of the shared dict. Also, as I mentioned before, adding time based expiry will further increase complexity, which is something I would like to avoid for a use case which is so uncommon.

Sn0rt commented 8 months ago

I am not sure I understand how it is different from any other environment?

In openresty, this library needs to be called at a specified phase. In most cases, it is the access phase of openresty. But there is no appropriate stage to delete it after the request ends.

There is a reason why del and reset are more complex than just removing a single key from the shared dict. Over time, lots of optimizations have been added to the library to support high-throughput use cases: separate keyspace in the dict for metric names (prometheus_keys.lua), per-worker eventually consistent counters, per-worker cache of full metric names. All of that increased the complexity of the code substantially, and will make it difficult to correctly implement time-based metric expiry, especially if it's based on exptime of the shared dict. Also, as I mentioned before, adding time based expiry will further increase complexity, which is something I would like to avoid for a use case which is so uncommon.

It is indeed possible, it may depend on the specific data type. Let me evaluate it first.

I think since share dict is used for storage, no matter what data type it is in Prometheus, it will happen together during recycling and the possibility of finding data inconsistency is very small.

Sn0rt commented 8 months ago

I think I figured it out, and it’s completely ok to give up on deleting it. The incr and set methods of nginx share dict have return values of whether to force setting. If the share dict is not complete enough, follow up with LRU for automatic eviction. But from another perspective? If automatic eviction is imposed, will the integrity of the monitoring data be affected?

monkeyDluffy6017 commented 8 months ago

Hi @knyar, many of our users have encountered this problem, for example no memory and high cpu usage image

What you suggest is to use a timer at upper level to remove metrics, but IMHO, we have several problems:

  1. The histogram type metric don't have the del interface
  2. The reset interface will clear call metrics which will inevitably affect the metrics we are watching
  3. Even if we add the del method of the histogram type, it is very complicated to maintain the last access status of the metric across multiple workers

So it's the easiest way to clean stale data in the library, just a few minor modifications, and I'm glad to use this feature in our open source project, i am sure it will make the library more and more powerful.

dolik-rce commented 8 months ago

Having 150k timeseries per application (as mentioned in the linked issue) is definitely well outside of the best practices for prometheus. See for example the warning given here. Using URL, IDs or other unbounded high-cardinality data as labels is always bad idea and there is not much that the client can help with. There will always be a limit where things just get overloaded.

Also, even if the client library deletes unused metrics, you might just pushed the problems further downstream, to your Thanos, Mimir or whatever is used to store the metrics data.

monkeyDluffy6017 commented 8 months ago

@dolik-rce Yeah, i agree with you, but the main dict will be full of metrics eventually, it will cause the problem i mentioned above. we need to provide a simple way to prevent this from happening. i think the ttl feature is the best way, and i'm ready for this.

knyar commented 8 months ago

the main dict will be full of metrics eventually, it will cause the problem i mentioned above

That's not something that is expected to happen, and it never happened in practice in any of the deployments where I've used this library. As @dolik-rce mentioned above, Prometheus is designed to work well with a relatively small and static set of unique combinations of metric label values.

I noticed that metric libraries for other languages (Ruby, Java) explicitly document this, even though it's not something that's specific to a particular metric library. Perhaps we should add a similar note to the README file as well.