strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.77k stars 1.27k forks source link

Additional scrape configs for Prometheus metrics can not be parsed. #5873

Closed NewmanJ1987 closed 2 years ago

NewmanJ1987 commented 2 years ago

Describe the bug I am trying to get some Kafka metrics running locally with Prometheus and Grafana. I'm getting stuck at this point when trying to apply the prometheus-additional.yaml file. This is the bug

level=error ts=2021-11-10T16:37:41.185907627Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"monitoring/prometheus\" failed: creating config failed: generating config failed: generate additional scrape configs: unmarshalling additional scrape configs failed: yaml: unmarshal errors:\n  line 1: cannot unmarshal !!map into []yaml.MapSlice"

To Reproduce Steps to reproduce the behavior: 1) I create some namespaces k create ns kafka k create ns monitoring

  1. Create Kafka cluster. Modifed the examples/kafka-metrics.yaml file slightly (will attach the file at the bottom) k apply -f kafka-metrics.yaml -n kafka

  2. Setup Prometheus Operator curl -s https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/bundle.yaml > bundle.yaml I modified bundle.yaml and changed the namespace to monitoring kubectl apply -f bundle.yaml -n monitoring

  3. Create a secret for the additional-scrape-configs kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring

  4. Modified the following file prometheus.yaml to update the namespace to monitoring and then applied them as well

    kubectl apply -f strimzi-pod-monitor.yaml -n monitoring
    kubectl apply -f prometheus-rules.yaml -n monitoring
    kubectl apply -f prometheus.yaml -n monitoring

    Expected behavior The secret additional-scrape-configs to be parsed correctly.

Environment (please complete the following information):

YAML files and logs This is the only file that I modified everything else is the generic file in the examples directory I just changed the namespace.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.0.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: external
        port: 9094
        type: nodeport
        tls: false
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    config:
      auto.create.topics.enable: "true"
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "3.0"
      inter.broker.protocol.version: "3.0"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 2Gi
        deleteClaim: false
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
  zookeeper:
    replicas: 1
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: zookeeper-metrics-config.yml
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"
    logging: debug
    enableSaramaLogging: true
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kafka-metrics
  labels:
    app: strimzi
data:
  kafka-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # Special cases and very specific rules
    - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       topic: "$4"
       partition: "$5"
    - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       broker: "$4:$5"
    - pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
      name: kafka_server_$1_connections_tls_info
      type: GAUGE
      labels:
        cipher: "$2"
        protocol: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
      name: kafka_server_$1_connections_software
      type: GAUGE
      labels:
        clientSoftwareName: "$2"
        clientSoftwareVersion: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: "kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):"
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    - pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    # Some percent metrics use MeanRate attribute
    # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    # Generic gauges for percents
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
      labels:
        "$4": "$5"
    # Generic per-second counters with 0-2 key/value pairs
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
    # Generic gauges with 0-2 key/value pairs
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
    # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
    # Note that these are missing the '_sum' metric!
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
        quantile: "0.$8"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        quantile: "0.$6"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        quantile: "0.$4"
  zookeeper-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # replicated Zookeeper
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+)><>(\\w+)"
      name: "zookeeper_$2"
      type: GAUGE
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+)><>(\\w+)"
      name: "zookeeper_$3"
      type: GAUGE
      labels:
        replicaId: "$2"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(Packets\\w+)"
      name: "zookeeper_$4"
      type: COUNTER
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(\\w+)"
      name: "zookeeper_$4"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+), name3=(\\w+)><>(\\w+)"
      name: "zookeeper_$4_$5"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"

To easily collect all YAMLs and logs, you can use our report script which will automatically collect all files and prepare a ZIP archive which can be easily attached to this issue. The usage of this script is: ./report.sh [--namespace <string>] [--cluster <string>] report-10-11-2021_12-23-09.zip

scholzj commented 2 years ago

So, did you actually do kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring? Or just kubectl apply -f prometheus-additional.yaml? BEcause assuming you used 0.26, the file it self is secret: https://github.com/strimzi/strimzi-kafka-operator/blob/main/examples/metrics/prometheus-additional-properties/prometheus-additional.yaml ... so you do not create a secret from it

NewmanJ1987 commented 2 years ago

I did this sir. kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring

NewmanJ1987 commented 2 years ago

I was following this guide. https://snourian.com/kafka-kubernetes-strimzi-part-3-monitoring-strimzi-kafka-with-prometheus-grafana/

scholzj commented 2 years ago

Ok, so can you try kubectl apply -f prometheus-additional.yaml -n monitoring to see if it helps?

kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring would create a secret inside a secret and Prometheus would not understand it. I guess this might have changed since the blog post.

NewmanJ1987 commented 2 years ago

Ok so this error goes away but I don't know if its working correclty. I can't see any information once I installed the grafana dashboards. image

scholzj commented 2 years ago

Well, I'm not an expert on Prometheus ... but you can check:

NewmanJ1987 commented 2 years ago

image

NewmanJ1987 commented 2 years ago

The kafka mycluster pod is missing the "prometheus.io/scrape" == "true" annotation

scholzj commented 2 years ago

We don't set any annotations since every user uses different Prometheus configuration and hardcoded annotations are cause of issues. If you installed the PodMonitors we provide, the Prometheus operator should configure your Prometheus to scrape it without any special annotations.

But if you need the annotation, you can set it via the Kafka custom resource: https://strimzi.io/docs/operators/latest/full/using.html#assembly-customizing-kubernetes-resources-str

NewmanJ1987 commented 2 years ago

I think I made some progress I updated with Pod Monitor and Pod rules like so.

kubectl apply -f strimzi-pod-monitor.yaml -n monitoring
kubectl apply -f prometheus-rules.yaml -n monitoring;

I am able to see some metrics now on one grafana dashboard (Strimzi Operators) but I still see no data in one dashboard (Strimzi Kafka Exporter). Any idea why this may be the case ?

image

image

scholzj commented 2 years ago

I guess that suggests that it now scrapes some pods, but not all of them.

NewmanJ1987 commented 2 years ago

Should I restart the pods ?

NewmanJ1987 commented 2 years ago

Yeah strange some of the dashboards show data and the others do not. I am basically just applying the the yaml from your examples/metrics. Do you know a way of checking the errors that can be thrown by the PodMonitors ?

scholzj commented 2 years ago

That would be somewhere in the Prometheus Operator. But I have no experience with debugging it I'm afraid.

sknot-rh commented 2 years ago

Note the Cluster name and Namespace in the Kafka Exporter dashboard are not set. Not having these values results in no data in the rest of the graphs. You should try to reload the page to see if it gets fetched by grafana properly. If not, you should check the kafka_exporter_build_info metric in the Prometheus UI. This metric should be present and have the labels set. If not, there is some issue with scrapping the metrics from Kafka Exporter. // edit Also, I remember there is some issue with Kafka Exporter. It does not emit some metrics when there is no traffic in the Kafka cluster. Is that the case?

NewmanJ1987 commented 2 years ago

Hi, As per your instruction I created a topic and I sent some traffic, unfortunately I still see no data.
The kafka_exporter_build_info label is missing and when I looked at the logs for the kafka Exporter I see this

[sarama] 2021/11/11 17:03:28 Closed connection to broker my-cluster-kafka-0.my-cluster-kafka-brokers.kafka.svc:9091
[sarama] 2021/11/11 17:03:37 client/metadata fetching metadata for all topics from broker my-cluster-kafka-bootstrap:9091
I1111 17:03:37.989779      11 kafka_exporter.go:366] Refreshing client metadata
[sarama] 2021/11/11 17:03:37 Connected to broker at my-cluster-kafka-0.my-cluster-kafka-brokers.kafka.svc:9091 (registered as #0)
I1111 17:03:38.069357      11 kafka_exporter.go:637] Fetching consumer group metrics
[sarama] 2021/11/11 17:03:38 Closed connection to broker my-cluster-kafka-0.my-cluster-kafka-brokers.kafka.svc:9091
[sarama] 2021/11/11 17:03:43 client/metadata fetching metadata for all topics from broker my-cluster-kafka-bootstrap:9091
[sarama] 2021/11/11 17:03:47 Connected to broker at my-cluster-kafka-0.my-cluster-kafka-brokers.kafka.svc:9091 (registered as #0)
I1111 17:03:48.075577      11 kafka_exporter.go:637] Fetching consumer group metrics
[sarama] 2021/11/11 17:03:48 Closed connection to broker my-cluster-kafka-0.my-cluster-kafka-brokers.kafka.svc:9091

I don't see any obvious errors. Can you point me in the right direction ?

sknot-rh commented 2 years ago

Missing the kafka_exporter_build_info metrics could be caused by incorrectly configured Prometheus scrapping. Can you share the config from the Prometheus UI? You shou be able to find it under Status/Configuration.

NewmanJ1987 commented 2 years ago

Sure. It is the default one from the examples.

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prometheus: monitoring/prometheus
    prometheus_replica: prometheus-prometheus-0
alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  alertmanagers:
  - follow_redirects: true
    scheme: http
    path_prefix: /
    timeout: 10s
    api_version: v2
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: alertmanager
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: alertmanager
      replacement: $1
      action: keep
    kubernetes_sd_configs:
    - role: endpoints
      kubeconfig_file: ""
      follow_redirects: true
      namespaces:
        names:
        - monitoring
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-rulefiles-0/*.yaml
scrape_configs:
- job_name: podMonitor/monitoring/bridge-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_strimzi_io_kind, __meta_kubernetes_pod_labelpresent_strimzi_io_kind]
    separator: ;
    regex: KafkaBridge;true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: rest-api
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: monitoring/bridge-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: rest-api
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: podMonitor/monitoring/cluster-operator-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_strimzi_io_kind, __meta_kubernetes_pod_labelpresent_strimzi_io_kind]
    separator: ;
    regex: cluster-operator;true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: http
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: monitoring/cluster-operator-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: podMonitor/monitoring/entity-operator-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name, __meta_kubernetes_pod_labelpresent_app_kubernetes_io_name]
    separator: ;
    regex: entity-operator;true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: healthcheck
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: monitoring/entity-operator-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: healthcheck
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: podMonitor/monitoring/kafka-resources-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_strimzi_io_kind, __meta_kubernetes_pod_labelpresent_strimzi_io_kind]
    separator: ;
    regex: Kafka|KafkaConnect|KafkaMirrorMaker|KafkaMirrorMaker2;true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: tcp-prometheus
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: monitoring/kafka-resources-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: tcp-prometheus
    action: replace
  - separator: ;
    regex: __meta_kubernetes_pod_label_(strimzi_io_.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_pod_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_node_name]
    separator: ;
    regex: (.*)
    target_label: node_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_host_ip]
    separator: ;
    regex: (.*)
    target_label: node_ip
    replacement: $1
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
- job_name: kubernetes-cadvisor
  honor_labels: true
  honor_timestamps: true
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  follow_redirects: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.*)
    target_label: node_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_node_address_InternalIP]
    separator: ;
    regex: (.*)
    target_label: node_ip
    replacement: $1
    action: replace
  metric_relabel_configs:
  - source_labels: [container, __name__]
    separator: ;
    regex: POD;container_(network).*
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [container]
    separator: ;
    regex: POD
    replacement: $1
    action: drop
  - source_labels: [container]
    separator: ;
    regex: ^$
    replacement: $1
    action: drop
  - source_labels: [__name__]
    separator: ;
    regex: container_(network_tcp_usage_total|tasks_state|memory_failures_total|network_udp_usage_total)
    replacement: $1
    action: drop
  kubernetes_sd_configs:
  - role: node
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
      - monitoring
- job_name: kubernetes-nodes-kubelet
  honor_timestamps: true
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  follow_redirects: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics
    action: replace
  kubernetes_sd_configs:
  - role: node
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
      - monitoring
sknot-rh commented 2 years ago

IIUC you have kafka namespace with running kafka cluster and monitoring namespace for the monitoring stack. Could you double-check that you have namespaces configured correctly? What about other metrics (not related to KafkaExporter), can you get them?

NewmanJ1987 commented 2 years ago

Yes. I can get the metrics from the Strimzi Operator and if I create a service and another job for prometheus I can get the stats for the Kafka Exporter as well.

apiVersion: v1
kind: Service
metadata:
  name: kafka-exporter-service
spec:
  selector:
    app.kubernetes.io/name: "kafka-exporter"
  ports:
    - protocol: TCP
      port: 9404
      targetPort: 9404

Added this job in prometheus-additional.yaml

    - job_name: kafka-exporter
      scrape_interval: 10s
      scrape_timeout: 10s
      static_configs:
       - targets: ["kafka-exporter-service.kafka:9404"]       
sknot-rh commented 2 years ago

I re-created your topology and I was able to get the KafkaExpoter metrics without any issue. The configuration generated by Prometheus

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prometheus: metrics/prometheus
    prometheus_replica: prometheus-prometheus-0
alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  alertmanagers:
  - follow_redirects: true
    scheme: http
    path_prefix: /
    timeout: 10s
    api_version: v2
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: alertmanager
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: alertmanager
      replacement: $1
      action: keep
    kubernetes_sd_configs:
    - role: endpoints
      kubeconfig_file: ""
      follow_redirects: true
      namespaces:
        names:
        - metrics
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-rulefiles-0/*.yaml
scrape_configs:
- job_name: podMonitor/metrics/bridge-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_strimzi_io_kind]
    separator: ;
    regex: KafkaBridge
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: rest-api
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: metrics/bridge-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: rest-api
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: podMonitor/metrics/cluster-operator-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_strimzi_io_kind]
    separator: ;
    regex: cluster-operator
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: http
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: metrics/cluster-operator-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: podMonitor/metrics/entity-operator-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
    separator: ;
    regex: entity-operator
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: healthcheck
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: metrics/entity-operator-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: healthcheck
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: podMonitor/metrics/kafka-resources-metrics/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_strimzi_io_kind]
    separator: ;
    regex: Kafka|KafkaConnect|KafkaMirrorMaker|KafkaMirrorMaker2
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: tcp-prometheus
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: metrics/kafka-resources-metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: tcp-prometheus
    action: replace
  - separator: ;
    regex: __meta_kubernetes_pod_label_(strimzi_io_.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_pod_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_node_name]
    separator: ;
    regex: (.*)
    target_label: node_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_host_ip]
    separator: ;
    regex: (.*)
    target_label: node_ip
    replacement: $1
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kafka
- job_name: kubernetes-cadvisor
  honor_labels: true
  honor_timestamps: true
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  follow_redirects: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.*)
    target_label: node_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_node_address_InternalIP]
    separator: ;
    regex: (.*)
    target_label: node_ip
    replacement: $1
    action: replace
  metric_relabel_configs:
  - source_labels: [container, __name__]
    separator: ;
    regex: POD;container_(network).*
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [container]
    separator: ;
    regex: POD
    replacement: $1
    action: drop
  - source_labels: [container]
    separator: ;
    regex: ^$
    replacement: $1
    action: drop
  - source_labels: [__name__]
    separator: ;
    regex: container_(network_tcp_usage_total|tasks_state|memory_failures_total|network_udp_usage_total)
    replacement: $1
    action: drop
  kubernetes_sd_configs:
  - role: node
    kubeconfig_file: ""
    follow_redirects: true
- job_name: kubernetes-nodes-kubelet
  honor_timestamps: true
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  follow_redirects: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics
    action: replace
  kubernetes_sd_configs:
  - role: node
    kubeconfig_file: ""
    follow_redirects: true
sknot-rh commented 2 years ago

Note there are some differences between your and mine config. Please do compare them. I think your job_name: podMonitor/monitoring/kafka-resources-metrics/0 is missing the namespace selector. That could be the issue.

Brandt1930 commented 2 years ago

Hi @NewmanJ1987, could you resolve this issue? I have the same problems with the same symptoms..

JitenPalaparthi commented 2 years ago

Ok, so can you try kubectl apply -f prometheus-additional.yaml -n monitoring to see if it helps?

kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring would create a secret inside a secret and Prometheus would not understand it. I guess this might have changed since the blog post.

It worked. You save my day.