strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.8k stars 1.28k forks source link

[Bug]: When I want to create a new kafka- instance in the same namespace, it doesn't work. would you help me check #9084

Closed peter-hst closed 1 year ago

peter-hst commented 1 year ago

Bug Description

I try to delete the kafka namespace, and I redeployed strimzi-0.32.0 cluster operator, and then create a new instance of kafka cluster. finally I sew the logs of strimzi-cluster-operator-f696c85f7-blmlj pod

logs details:

2023-09-07 01:37:37 INFO  ClusterOperator:125 - Triggering periodic reconciliation for namespace kafka
2023-09-07 01:37:39 INFO  StrimziPodSetController:292 - Reconciliation #25(watch) StrimziPodSet(kafka/qa-broker-zookeeper): StrimziPodSet will be reconciled
2023-09-07 01:37:39 INFO  AbstractOperator:237 - Reconciliation #24(timer) KafkaRebalance(kafka/qa-rebalance): KafkaRebalance qa-rebalance will be checked for creation or modification
2023-09-07 01:37:39 INFO  StrimziPodSetController:328 - Reconciliation #25(watch) StrimziPodSet(kafka/qa-broker-zookeeper): reconciled
2023-09-07 01:37:39 INFO  StrimziPodSetController:292 - Reconciliation #26(watch) StrimziPodSet(kafka/qa-broker-zookeeper): StrimziPodSet will be reconciled
2023-09-07 01:37:39 INFO  StrimziPodSetController:328 - Reconciliation #26(watch) StrimziPodSet(kafka/qa-broker-zookeeper): reconciled
2023-09-07 01:37:40 ERROR AbstractOperator:264 - Reconciliation #24(timer) KafkaRebalance(kafka/qa-rebalance): createOrUpdate failed
java.lang.RuntimeException: Secret kafka/qa-broker-cruise-control-certs does not exist
    at io.strimzi.operator.common.Util.missingSecretException(Util.java:218) ~[io.strimzi.operator-common-0.32.0.jar:0.32.0]
    at io.strimzi.operator.cluster.operator.assembly.KafkaRebalanceAssemblyOperator.lambda$reconcileRebalance$31(KafkaRebalanceAssemblyOperator.java:1190) ~[io.strimzi.cluster-operator-0.32.0.jar:0.32.0]
    at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.CompositeFutureImpl.complete(CompositeFutureImpl.java:172) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.CompositeFutureImpl.lambda$join$3(CompositeFutureImpl.java:109) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[io.netty.netty-transport-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
2023-09-07 01:37:40 WARN  AbstractOperator:516 - Reconciliation #24(timer) KafkaRebalance(kafka/qa-rebalance): Failed to reconcile
java.lang.RuntimeException: Secret kafka/qa-broker-cruise-control-certs does not exist
    at io.strimzi.operator.common.Util.missingSecretException(Util.java:218) ~[io.strimzi.operator-common-0.32.0.jar:0.32.0]
    at io.strimzi.operator.cluster.operator.assembly.KafkaRebalanceAssemblyOperator.lambda$reconcileRebalance$31(KafkaRebalanceAssemblyOperator.java:1190) ~[io.strimzi.cluster-operator-0.32.0.jar:0.32.0]
    at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.CompositeFutureImpl.complete(CompositeFutureImpl.java:172) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.CompositeFutureImpl.lambda$join$3(CompositeFutureImpl.java:109) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[io.netty.netty-transport-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
2023-09-07 01:37:45 INFO  AbstractOperator:378 - Reconciliation #1(watch) Kafka(kafka/qa-broker): Reconciliation is in progress
2023-09-07 01:38:45 INFO  AbstractOperator:378 - Reconciliation #1(watch) Kafka(kafka/qa-broker): Reconciliation is in progress
2023-09-07 01:39:37 INFO  ClusterOperator:125 - Triggering periodic reconciliation for namespace kafka
2023-09-07 01:39:39 INFO  AbstractOperator:237 - Reconciliation #28(timer) KafkaRebalance(kafka/qa-rebalance): KafkaRebalance qa-rebalance will be checked for creation or modification
2023-09-07 01:39:40 ERROR AbstractOperator:264 - Reconciliation #28(timer) KafkaRebalance(kafka/qa-rebalance): createOrUpdate failed
java.lang.RuntimeException: Secret kafka/qa-broker-cruise-control-certs does not exist
    at io.strimzi.operator.common.Util.missingSecretException(Util.java:218) ~[io.strimzi.operator-common-0.32.0.jar:0.32.0]
    at io.strimzi.operator.cluster.operator.assembly.KafkaRebalanceAssemblyOperator.lambda$reconcileRebalance$31(KafkaRebalanceAssemblyOperator.java:1190) ~[io.strimzi.cluster-operator-0.32.0.jar:0.32.0]
    at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]
    at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.3.4.jar:4.3.4]

Steps to reproduce

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: qa-broker # kafka cluster instance name
  namespace: kafka
spec:
  kafka:
    rack:
      topologyKey: "app"
    template:
      pod:
        metadata:
          labels:
            app: kafka
        # affinity:
        # nodeAffinity:
        #    requiredDuringSchedulingIgnoredDuringExecution:
        #       nodeSelectorTerms:
        #      - matchExpressions:
        #        - key: app
        #          operator: In
        #          values:
        #          - kafka
        tolerations:
          - key: app
            operator: "Equal"
            value: kafka
    version: 3.3.1
    replicas: 3
    resources:
      requests:
        memory: 2Gi # recommend 4G in prod, prod work node config: 4 core, 8G RAM
        cpu: 500m # recommend 500m in prod, prod work node config: 4 core, 8G RAM
      limits:
        memory: 2Gi
        cpu: 1000m # recommend 2600m in prod, prod work node config: 4 core, 8G RAM
    jvmOptions:
      -Xms: 1536m
      -Xmx: 1536m
      gcLoggingEnabled: false
    jmxOptions:
      authentication:
        type: "password"
    readinessProbe:
      initialDelaySeconds: 25
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 25
      timeoutSeconds: 5
    listeners:
      - name: plain
        type: internal
        port: 9092
        tls: false
      - name: external
        port: 9094
        type: ingress
        tls: true
        authentication:
          type: scram-sha-512        
        configuration:
          bootstrap:
            host: kafka-bootstrap-qa.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-bootstrap-qa.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
          brokers:
          - broker: 0
            host: kafka-broker-qa-0.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-broker-qa-0.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
          - broker: 1
            host: kafka-broker-qa-1.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-broker-qa-1.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
          - broker: 2
            host: kafka-broker-qa-2.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-broker-qa-2.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
    authorization:
      type: simple   
      superUsers:
        - admin  
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 1
      default.replication.factor: 3
      min.insync.replicas: 2
      log.retention.hours: 168
      offsets.retention.minutes: 43800
      num.partitions: 6
      auto.create.topics.enable: false
      unclean.leader.election.enable: false
      auto.leader.rebalance.enable: false
      inter.broker.protocol.version: "3.3"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 16Gi
        class: default
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 16Gi
        class: default
        deleteClaim: false        
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
  zookeeper:
    template:
      pod:
        topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  app: kafka
              maxSkew: 1
              topologyKey: app
              whenUnsatisfiable: ScheduleAnyway      
        metadata:
          labels:
            app: kafka
       # affinity:
       #   nodeAffinity:
       #     requiredDuringSchedulingIgnoredDuringExecution:
       #       nodeSelectorTerms:
       #       - matchExpressions:
       #         - key: app
       #           operator: In
       #           values:
       #           - kafka
        tolerations:
          - key: app
            operator: "Equal"
            value: kafka
    replicas: 3
    storage:
      type: persistent-claim
      class: default
      size: 10Gi
      deleteClaim: false
    jmxOptions:
      authentication:
        type: "password"
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: zookeeper-metrics-config.yml
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"
  cruiseControl:
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: cruise-control-metrics
          key: metrics-config.yml
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: cruise-control-metrics
  labels:
    app: strimzi
data:
  metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    - pattern: kafka.cruisecontrol<name=(.+)><>(\w+)
      name: kafka_cruisecontrol_$1_$2
      type: GAUGE
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kafka-metrics
  labels:
    app: strimzi
data:
  kafka-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # Special cases and very specific rules
    - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       topic: "$4"
       partition: "$5"
    - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       broker: "$4:$5"
    - pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
      name: kafka_server_$1_connections_tls_info
      type: GAUGE
      labels:
        cipher: "$2"
        protocol: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
      name: kafka_server_$1_connections_software
      type: GAUGE
      labels:
        clientSoftwareName: "$2"
        clientSoftwareVersion: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: "kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):"
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    - pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    # Some percent metrics use MeanRate attribute
    # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    # Generic gauges for percents
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
      labels:
        "$4": "$5"
    # Generic per-second counters with 0-2 key/value pairs
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
    # Generic gauges with 0-2 key/value pairs
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
    # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
    # Note that these are missing the '_sum' metric!
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
        quantile: "0.$8"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        quantile: "0.$6"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        quantile: "0.$4"
  zookeeper-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # replicated Zookeeper
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+)><>(\\w+)"
      name: "zookeeper_$2"
      type: GAUGE
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+)><>(\\w+)"
      name: "zookeeper_$3"
      type: GAUGE
      labels:
        replicaId: "$2"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(Packets\\w+)"
      name: "zookeeper_$4"
      type: COUNTER
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(\\w+)"
      name: "zookeeper_$4"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+), name3=(\\w+)><>(\\w+)"
      name: "zookeeper_$4_$5"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  name: qa-rebalance
  labels:
    strimzi.io/cluster: qa-broker # kafka cluster instance name, must be same
spec:
  goals:
    - CpuCapacityGoal
    - NetworkInboundCapacityGoal
    - DiskCapacityGoal
    - RackAwareGoal
    - MinTopicLeadersPerBrokerGoal
    - NetworkOutboundCapacityGoal
    - ReplicaCapacityGoal

Expected behavior

No response

Strimzi version

0.32.0

Kubernetes version

Kubernetes 1.24.14

Installation method

yaml

Infrastructure

SAP-Gardener

Configuration files and logs

the ssl-passthrough option is enabled of ingress-v1.5.1

kubectl create namespace kafka

wget https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.32.0/strimzi-0.32.0.zip
unzip strimzi-0.32.0.zip && cd strimzi-0.32.0

sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

kubectl apply -f install/cluster-operator -n kafka

Additional context

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: qa-broker # kafka cluster instance name
  namespace: kafka
spec:
  kafka:
    rack:
      topologyKey: "app"
    template:
      pod:
        metadata:
          labels:
            app: kafka
        # affinity:
        # nodeAffinity:
        #    requiredDuringSchedulingIgnoredDuringExecution:
        #       nodeSelectorTerms:
        #      - matchExpressions:
        #        - key: app
        #          operator: In
        #          values:
        #          - kafka
        tolerations:
          - key: app
            operator: "Equal"
            value: kafka
    version: 3.3.1
    replicas: 3
    resources:
      requests:
        memory: 2Gi # recommend 4G in prod, prod work node config: 4 core, 8G RAM
        cpu: 500m # recommend 500m in prod, prod work node config: 4 core, 8G RAM
      limits:
        memory: 2Gi
        cpu: 1000m # recommend 2600m in prod, prod work node config: 4 core, 8G RAM
    jvmOptions:
      -Xms: 1536m
      -Xmx: 1536m
      gcLoggingEnabled: false
    jmxOptions:
      authentication:
        type: "password"
    readinessProbe:
      initialDelaySeconds: 25
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 25
      timeoutSeconds: 5
    listeners:
      - name: plain
        type: internal
        port: 9092
        tls: false
      - name: external
        port: 9094
        type: ingress
        tls: true
        authentication:
          type: scram-sha-512        
        configuration:
          bootstrap:
            host: kafka-bootstrap-qa.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-bootstrap-qa.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
          brokers:
          - broker: 0
            host: kafka-broker-qa-0.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-broker-qa-0.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
          - broker: 1
            host: kafka-broker-qa-1.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-broker-qa-1.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
          - broker: 2
            host: kafka-broker-qa-2.abc.com
            annotations:
              external-dns.alpha.kubernetes.io/hostname: kafka-broker-qa-2.abc.com.
              external-dns.alpha.kubernetes.io/ttl: "60"
    authorization:
      type: simple   
      superUsers:
        - admin  
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 1
      default.replication.factor: 3
      min.insync.replicas: 2
      log.retention.hours: 168
      offsets.retention.minutes: 43800
      num.partitions: 6
      auto.create.topics.enable: false
      unclean.leader.election.enable: false
      auto.leader.rebalance.enable: false
      inter.broker.protocol.version: "3.3"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 16Gi
        class: default
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 16Gi
        class: default
        deleteClaim: false        
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
  zookeeper:
    template:
      pod:
        topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  app: kafka
              maxSkew: 1
              topologyKey: app
              whenUnsatisfiable: ScheduleAnyway      
        metadata:
          labels:
            app: kafka
       # affinity:
       #   nodeAffinity:
       #     requiredDuringSchedulingIgnoredDuringExecution:
       #       nodeSelectorTerms:
       #       - matchExpressions:
       #         - key: app
       #           operator: In
       #           values:
       #           - kafka
        tolerations:
          - key: app
            operator: "Equal"
            value: kafka
    replicas: 3
    storage:
      type: persistent-claim
      class: default
      size: 10Gi
      deleteClaim: false
    jmxOptions:
      authentication:
        type: "password"
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: zookeeper-metrics-config.yml
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"
  cruiseControl:
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: cruise-control-metrics
          key: metrics-config.yml
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: cruise-control-metrics
  labels:
    app: strimzi
data:
  metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    - pattern: kafka.cruisecontrol<name=(.+)><>(\w+)
      name: kafka_cruisecontrol_$1_$2
      type: GAUGE
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kafka-metrics
  labels:
    app: strimzi
data:
  kafka-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # Special cases and very specific rules
    - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       topic: "$4"
       partition: "$5"
    - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       broker: "$4:$5"
    - pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
      name: kafka_server_$1_connections_tls_info
      type: GAUGE
      labels:
        cipher: "$2"
        protocol: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
      name: kafka_server_$1_connections_software
      type: GAUGE
      labels:
        clientSoftwareName: "$2"
        clientSoftwareVersion: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: "kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):"
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    - pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    # Some percent metrics use MeanRate attribute
    # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    # Generic gauges for percents
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
      labels:
        "$4": "$5"
    # Generic per-second counters with 0-2 key/value pairs
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
    # Generic gauges with 0-2 key/value pairs
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
    # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
    # Note that these are missing the '_sum' metric!
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
        quantile: "0.$8"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        quantile: "0.$6"
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
    - pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        quantile: "0.$4"
  zookeeper-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # replicated Zookeeper
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+)><>(\\w+)"
      name: "zookeeper_$2"
      type: GAUGE
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+)><>(\\w+)"
      name: "zookeeper_$3"
      type: GAUGE
      labels:
        replicaId: "$2"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(Packets\\w+)"
      name: "zookeeper_$4"
      type: COUNTER
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(\\w+)"
      name: "zookeeper_$4"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+), name3=(\\w+)><>(\\w+)"
      name: "zookeeper_$4_$5"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  name: qa-rebalance
  labels:
    strimzi.io/cluster: qa-broker # kafka cluster instance name, must be same
spec:
  goals:
    - CpuCapacityGoal
    - NetworkInboundCapacityGoal
    - DiskCapacityGoal
    - RackAwareGoal
    - MinTopicLeadersPerBrokerGoal
    - NetworkOutboundCapacityGoal
    - ReplicaCapacityGoal
peter-hst commented 1 year ago

It has been working fine in the past by the same configuration, when I redeployment today, it doesn't work....

peter-hst commented 1 year ago

I‘m try to install the latest of strimzi, it output below info:

io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Ingress resource qa-broker-kafka-bootstrap in namespace kafka to be addressable                    │
│     at io.strimzi.operator.common.VertxUtil$1.lambda$handle$1(VertxUtil.java:154) ~[io.strimzi.operator-common-0.37.0.jar:0.37.0]                                                                                │
│     at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[io.vertx.vertx-core-4.4.4.jar:4.4.4]                                                                                              │
│     at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:69) ~[io.vertx.vertx-core-4.4.4.jar:4.4.4]                                                                                      │
│     at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[io.netty.netty-common-4.1.94.Final.jar:4.1.94.Final]                                                             │
│     at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[io.netty.netty-common-4.1.94.Final.jar:4.1.94.Final]                                                         │
│     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.94.Final.jar:4.1.94.Final]                                                 │
│     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[io.netty.netty-transport-4.1.94.Final.jar:4.1.94.Final]                                                                                    │
│     at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty.netty-common-4.1.94.Final.jar:4.1.94.Final]                                                       │
│     at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.94.Final.jar:4.1.94.Final]                                                                          │
│     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.94.Final.jar:4.1.94.Final]                                                              │
│     at java.lang.Thread.run(Thread.java:833) ~[?:?]