strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.74k stars 1.27k forks source link

[Question] Missing kafka_cluster_* metrics #3584

Closed ndajr closed 4 years ago

ndajr commented 4 years ago

I tried to look at the documentation but I couldn't find anything related to how to export at-min-insync metrics. My prometheus doesn't have kafka_cluster_* metrics (kafka_cluster_partition_atminisr and kafka_cluster_partition_underminisr) and some panels on kafka-exporter dashboard are not working. Am I missing something on Strimzi kafkaExporter definition?

scholzj commented 4 years ago

I assume you deployed the Kafka Exporter from your Kafka CR, right? In that case I guess you should check if the metrics form the Kafka Exporter are scraped by Prometheus (check the Prometheus targets whether you see KE there). You could also just curl into the KE metrics endpoint to see if you have the metrics there.

ndajr commented 4 years ago

Yes, I have all the metrics on prometheus (also on kafka exporter /metrics endpoint) except kafka_cluster_partition_atminisr and kafka_cluster_partition_underminisr. My Kafka deployment looks like this:

apiVersion: kafka.strimzi.io/v1alpha1
kind: Kafka
metadata:
  name: my-kafka-cluster
spec:
  kafka:
    version: 2.4.0
    replicas: 3
    resources:
      ...
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "2.4"
    storage:
      ...
  zookeeper:
    ...
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    groupRegex: ".*"
    topicRegex: ".*"
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/scrape: true
            prometheus.io/port: 9404
            prometheus.io/path: /metrics
scholzj commented 4 years ago

So other metrics from Kafka Exporter are scraped correctly? Do you see these metrics in the metrics endpoint if you query it directly?

ndajr commented 4 years ago

Yes I see the other metrics

curl -v http://localhost:9404/metrics
# TYPE go_gc_duration_seconds summary
...
# TYPE kafka_brokers gauge
...
# TYPE kafka_consumergroup_current_offset gauge
...
# TYPE kafka_consumergroup_lag gauge
...
# TYPE kafka_exporter_build_info gauge
...
# TYPE kafka_topic_partition_current_offset gauge
...
# TYPE kafka_topic_partition_in_sync_replica gauge
...
# TYPE kafka_topic_partition_leader gauge
...
# TYPE kafka_topic_partition_leader_is_preferred gauge
...
# TYPE kafka_topic_partition_oldest_offset gauge
...
# TYPE kafka_topic_partition_replicas gauge
...
# TYPE kafka_topic_partition_under_replicated_partition gauge
...
# TYPE kafka_topic_partitions gauge
...
# TYPE process_cpu_seconds_total counter
...
# TYPE process_max_fds gauge
...
# TYPE process_open_fds gauge
...
# TYPE process_resident_memory_bytes gauge
...
# TYPE process_start_time_seconds gauge
...
# TYPE process_virtual_memory_bytes gauge
...
ndajr commented 4 years ago

I see that those metrics were implemented on KIP-427 but I don't see anything documented on Kafka Exporter/Strimzi about that

ndajr commented 4 years ago

For me looks like we don't have those metrics coming from kafka exporter, we would need to add them there:

https://github.com/danielqsj/kafka_exporter/blob/master/kafka_exporter.go#L33

Like this:

clusterPartitionAtMinIsr               *prometheus.Desc
clusterPartitionUnderMinIsr            *prometheus.Desc

I'm just asking here because I see that you implemented the dashboards and I might be missing something.

scholzj commented 4 years ago

So if they were implemented by a KIP, they will be in Kafka and need to be scraped from there. So you need to do the same but for Kafka:

I assumed they were coming from Kafka Exporter since you mentioned it in the first post, sorry.

ndajr commented 4 years ago

Prometheus scrape the kafka exporter /metrics endpoint right? Since I don't have kafka_cluster_partition_atminisr and kafka_cluster_partition_underminisr on kafka exporter, prometheus won't have them as well.

This makes me think the problem is on Kafka Exporter that is not scraping those metrics from Kafka. Maybe it's a problem with my kafkaExporter definition, that's what I meant on my first comment. But then I tried to read Kafka Exporter docs and their code and I didn't find the neither kafka_cluster_partition_atminisr or kafka_cluster_partition_underminisr metrics being implemented, are them custom metrics coming from Strimzi operator?

ndajr commented 4 years ago

I'm curious and asking here first because I see that you are already using them here and here

scholzj commented 4 years ago

Prometheus scrape the kafka exporter /metrics endpoint right? Since I don't have kafka_cluster_partition_atminisr and kafka_cluster_partition_underminisr on kafka exporter, prometheus won't have them as well.

No. Your Prometheus has to scrape metrics from the Kafka Exporter as well as from Kafka brokers directly. I guess the name could be confusing. But Kafka Exporter does not expose the Kafka metrics. It is used to create additional metrics which Kafka it self doesn't provide (with the main focus being consumer lag monitoring). So you need to have Prometheus scrape the Kafka metrics from Kafka + the additional metrics from Kafka Exporter.

ndajr commented 4 years ago

I think I need to add jmxTrans definition ("When the property is present a JmxTrans deployment is created for gathering JMX metrics from each Kafka broker"). One suggestion would be to add this information on this article: https://strimzi.io/blog/2019/10/14/improving-prometheus-metrics/. I never used jmxtrans before but it seems it is based around a push model, how does that fit into Prometheus? It's not clear for me yet how to export metrics from Kafka brokers using strimzi

ndajr commented 4 years ago

NVM I found what I was looking for, I'm gonna try this: https://github.com/strimzi/strimzi-kafka-operator/blob/master/documentation/assemblies/assembly-prometheus-metrics.adoc

scholzj commented 4 years ago

Right, the JMX Trans should not be needed for Prometheus. the exports are build directly into it in Strimzi.

ndajr commented 4 years ago

Nice I fixed it here and I can see all the metrics on prometheus. The problem was that I didn't have the metrics defined for Kafka, so I copied everything from the example and also prometheus was not scraping the kafka brokers, so I added the template prometheus annotations like this:

spec:
  kafka:
    ...
    metrics:
      lowercaseOutputName: true
      rules:
        ...
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/scrape: true
            prometheus.io/port: 9404
            prometheus.io/path: /metrics

I'm closing this issue, thank you very much for the support @scholzj!

bfarayev commented 2 years ago

I can confirm that the same method works for zookeeper, cruisecontrol as well 👍 thanks for the fix @neemiasjnr