yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters
Apache License 2.0
11.76k stars 2.5k forks source link

Any examples of using CMAK with Prometheus + Grafana to monitor consumer lag? #894

Open rja1 opened 1 year ago

rja1 commented 1 year ago

I'd like to feed this data into Grafana: https://some_server.com/api/status/some_cluster/consumersSummary.

Anyone have experience/examples of doing this with CMAK?

Thanks

one-two-my-gad commented 1 year ago

你可以尝试通过,消息挤压来推算

janengelmohr commented 1 year ago

Hey @rja1, I am a little late here maybe but why don't you just use the JMX-Export directly from Kafka to feed it into your prometheus/grafana?

rja1 commented 1 year ago

Good question @engelmohr, but jmx-exporter doesn't provide consumer offset data. I would have just used Kafka-exporter, but it doesn't support SASL_PLAINTEXT, which is what our clusters use for auth. I could have hit the kafka-ui api. It's not ideal though, because you have to make an api call for each consumer. Additionally, it can't connect to one of our legacy 0.10.2.0 clusters (unsupported version).

In the end, I wrote a python script to hit the CMAK api in a single call, store the offset data into a mysql database, where it can be mapped as a datasource in Grafana. Works great, but feels like a little bit like a hack

OneCricketeer commented 1 year ago

Kafka-exporter, but it doesn't support SASL_PLAINTEXT

JMX Exporter on the consumer clients is what you would need for lag. This is external from any auth settings.

You could also use Burrow to monitor lag for non-JVM clients.

rja1 commented 1 year ago

Thanks @OneCricketeer. As I recall, JMX doesn't expose lag data and Burrow doesn't support SASL_PLAINTEXT

OneCricketeer commented 1 year ago

Consumer JMX does have lag; not the broker/producer JMX.

I don't use Burrow, but I'd be very surprised if they did not... It uses Sarama, which does support it, to read directly from the offsets topic

rja1 commented 1 year ago

Gotcha. I actually ended up just writing a Python hack to slap the CMAK api, pulling the lag data by group and persist it to a mysql backend. I then tied it into grafana. Works great, but it's a little kludgy. Anyway, thanks for your replies. I'll check out Burrow again for fun

OneCricketeer commented 1 year ago

I took a look at Burrow myself, and seems there is an open PR for SASL PLAINTEXT, so you were right.