Closed optimistic5 closed 4 years ago
This plugin emits metrics for all queues unconditionally. @gerhard this sounds like something other tools should handle.
You can download the official RabbitMQ-Overview Grafana dashboard to see how we aggregate metrics across multiple objects (queues/connections/channels) etc. This is the query that will work in your case:
sum(rabbitmq_queue_messages_ready{queue =~ "^guest_.+"} * on(instance) group_left(rabbitmq_cluster, rabbitmq_node) rabbitmq_identity_info{rabbitmq_cluster="$rabbitmq_cluster"}) by(rabbitmq_node)
From v3.8.3 onwards, all metrics are aggregated. To enable per object metrics, you need to add the following line to your rabbitmq.conf
file:
prometheus.return_per_object_metrics = true
If the above helps, please close the issue.
This query should work for prometheus alerting, correct?
This expression will work for Prometheus alerting if you enable metrics per object (see config from previous comment):
rabbitmq_queue_messages_ready{queue =~ "^guest_.+"} > 1000
If you need to know the cluster name (because you are running more than 1 RabbitMQ deployment that you want to have alerting for), your alerting query will become:
(rabbitmq_queue_messages_ready{queue =~ "^guest_.+"} * on(instance) group_left(rabbitmq_cluster) rabbitmq_identity_info) > 1000
FTR, per-object metric collection now has a brief documentation section.
i see that its not recommanded to use the "metrics per object". the things is, most of the alerts and use cases are based on queue. i mean, we usually want to get all metrics and indicate if specific queue has too many ready messages, we are using it specifically for HPA (scale more pods based on queue size).
until now we've used the prometheus_rabbitmq_exporter , but now i saw this exporter comes built-it so thought it will be better, as also the grafana dashboard counts on it. but how do you monitor the rabbitmq cluster as a whole? its only our usecase to view per queue metrics? is there anything else we should do besides enabling the "not for production" flag? thanks!
@lechen26 FYI, this is not a support forum and RabbitMQ core team would greatly appreciate if you have moved all questions about this plugin and otherwise to the mailing list.
Metric aggregation is a practical necessity in environments with a lot of objects. See some payload size and generation times mentioned in #24. It is not realistic to alert on "metric X in queue Q" when you have 200K of them, each with 35-40 metrics. The math of scraping response size simply would not add up to practical possible response times. In that case you alert on the overall state of your system and then humans narrow it down using other available tools.
Again, unfortunately, N objects by M metrics each and 2-3 lines (including metadata/comments) per metric can produce a very large response. It's a format output issue, not a plugin implementation one, so any Prometheus exporter would face it at some point and either do what we did or end up in the scenario outlined in #24.
I'm not sure what this "not for production" flag is. This plugin is recommended for production. It even has two modes of operations now, one for those who want an efficient and compact overview (aggregated metrics) and another for those who want best per-object fidelity. The original plugin does not give you much choice.
OK, I think I understand what the "not for production" comment was about. We will try to edit the docs to explain the options and recommendations.
@lechen26 let me know if the updated doc section makes more sense to you.
Now I have this alert for Prometheus:
Notice than I have:
{queue="my-queue"}
To monitor all queues I can use this line, right?
rabbitmq_queue_messages_ready
But I want to monitor queues using wildcard. For example:
{queue="guest_*"}
Now I need to add each queues separately, but it is a lot of them and new queues becomes often. This feature will help a lot. Thank you.