rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.06k stars 3.91k forks source link

Khepri metrics are showing up without any tags (v4 beta 5) #12142

Open luos opened 2 weeks ago

luos commented 2 weeks ago

Hi,

We're testing out Khepri and reviewing how we could monitor its behaviour and performance.

Today, we monitor mnesia transaction counters to see if there is a high amount of churn in the system - mostly because mnesia can cause some issues if the transaction count / restarts are very high.

We've noticed, that the metrics in the metric family ra_metrics show up without any tags, which I think potentially should be either in a different family, ie. khepri or metadata , or they should have proper tags, ie. for the rabbit_metadata|quorum_queues, etc.

Test setup:

  1. Deploy v4 beta 5
  2. Enable the khepri_db feature flag
  3. Create a quorum queue
  4. Call the metrics api curl localhost:15692/metrics/detailed?family=ra_metrics
  5. Call rabbitmqctl eval 'rabbit_khepri:status().'

Excerpt from the output:

$ curl localhost:15692/metrics/detailed?family=ra_metrics
rabbitmq_detailed_raft_log_last_written_index{vhost="/",queue="qq1"} 2
rabbitmq_detailed_raft_log_last_written_index 49
$ rabbitmqctl eval 'rabbit_khepri:status().'
...  {<<"Last Written">>,49}, ...

Describe the solution you'd like

Describe alternatives you've considered

I tried to look at the metric collection code, but from my cursory review I could not figure out how to add the tag, and not even sure that would be the preferred way to go about it. đŸ˜„

Additional context

Due to Khepri's consistency behaviour with the projections, It would be good to know if a node is falling behind.

michaelklishin commented 2 weeks ago

We can both use a tag (if we do so for other Ra machine process types) and provide a new endpoint or a metric family.

luos commented 2 weeks ago

Thanks, yeah, I was thinking similar to what the-mikedavis proposed in the PR.

I think it is worth considering a different family, ie. we may have a many thousands of queues but only one/few metadata processes, and I expect we will care more about khepri than QQ indexes, though I can't say for sure at this point in time. :-)

At the same time, I do not expect the khepri process to take a lot of traffic - but I am sure it will happen.