noxdafox / rabbitmq-cloudwatch-exporter

RabbitMQ Plugin for publishing cluster metrics to AWS CloudWatch
Mozilla Public License 2.0
41 stars 9 forks source link

Metrics not reset (accumulating) #27

Closed webmozart closed 3 years ago

webmozart commented 3 years ago

Apologies in advance if this is a usage error.

I'm using this great plugin for two weeks now and recently enabled the export of "Publish", "Deliver" and "Ack". My expectation was that the plugin would export the number of such events within the last minute, but it seems to me it is exporting the total since the server start instead:

image

This diagram should contain spikes, but it contains steps instead, i.e. it is accumulating the metrics, which is incorrect. I double checked against the graphs in the RabbitMQ Management plugin to validate. I also double checked the configuration in CloudWatch, where everything seems correct. I did not override the storage_resolution parameter in my plugin configuration.

Is this the expected behavior?

webmozart commented 3 years ago

I think I found the culprint: In https://github.com/noxdafox/rabbitmq-cloudwatch-exporter/blob/274cef6ed6d09ea0781a63abd09eb00ea3c6c46d/lib/rabbitmq_cloudwatch_exporter/overview_metrics.ex#L27 and other similar pieces of code, Common.no_range is passed as range parameter, which as a result returns the metric since startup. Does anybody find that useful?

It would be great if we could either pass a range, or if the range was even preconfigured to cloudwatch_exporter.export_period. The latter would make most sense IMO, as any aggregation can then be done in CloudWatch on top of that (e.g. total messages published in the last two weeks etc.).

I might try myself at a PR, but I have no Erlang skills unfortunately.

webmozart commented 3 years ago

Alright I found the answer. In CloudWatch metrics math, there's a RATE() function which returns exactly the desired values.

noxdafox commented 3 years ago

The plugin collects the metrics raw from the broker internal metrics storage and uploads them as they are.

It does not apply any transformation. The Management plugin collects the same metrics but transforms them according to the needs of representation.

It is responsibility of the user to decide how to represent the metrics as such. Reason for this is that applying transformations to metrics often reduces their resolution or hides some information.

Most of the dashboard technologies (Graphana, Chronograph, CW, ...) allow to set how to represent information. In your case, what you need is the derivative over time of the Publish value.

Glad you sorted your need.

webmozart commented 3 years ago

Thanks for your answer and for the plugin!