strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.76k stars 1.27k forks source link

[Bug]: Kafmirrormaker2 not translating consumergroup offsets #9609

Closed cytar closed 7 months ago

cytar commented 7 months ago

Bug Description

despite the use of "sync.group.offsets.enabled: true" in checkpointConnector.config of Kafkamirrormaker2 object, consumergroup offset are not correctly translated.

Steps to reproduce ( i changed servers and clusters names for security purpose)

in kafka 3.5.x: 1) on source and target, create:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: "testphenixcorpsscy"
  namespace: test
  labels:
    strimzi.io/cluster: test
spec:
  topicName: "testphenixcorpsscy"
  partitions: 6
  config:
    flush.ms: 1000
    flush.messages: 10000
    retention.bytes: -1

2) on target, create:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: mm2-gcp-to-test-scydebug
  namespace: test
spec:
  version: 3.6.0                                                
  replicas: 6                                                
  connectCluster: "test"                           
  clusters:                                                     
  - alias: "gcp"                                  
    bootstrapServers: "testbootstrap:9092"
  - alias: "test"                                  
    bootstrapServers: test-kafka-brokers.test.svc.cluster.local:9092
    config:                                                     
      config.storage.replication.factor: -1
      offset.storage.replication.factor: -1
      status.storage.replication.factor: -1
      config.storage.topic: mirrormaker2-cluster-config-debugscy
      offset.storage.topic: mirrormaker2-cluster-offset-debugscy
      status.storage.topic: mirrormaker2-cluster-status-debugscy
      group.id: mm2-gcp-to-test-scydebug-group
      producer.ack: 1
      producer.batch.size: 50000
      producer.buffer.memory: 225000000
      producer.compression.type: gzip
      producer.linger.ms: 1500
      producer.max.request.size: 157286400
      producer.request.timeout.ms: 60000
      offset_flush_timeout: 250000
      task_shutdown_graceful_timeout_ms: 10000
  mirrors:                                                       
  - sourceCluster: "gcp"                           
    targetCluster: "test"                           
    sourceConnector:                                             
      tasksMax: 6                                               
      config:    
        replication.factor: 3
        offset-syncs.topic.replication.factor: 3                                                
        sync.topic.acls.enabled: "false"                         
        sync.topic.configs.enabled: "false"                      
        refresh.topics.enabled: "false"                          
        topic.creation.enable: "false"                          
        replication.policy.class: "org.apache.kafka.connect.mirror.IdentityReplicationPolicy"      
        offset-syncs.topic.location: target                      
        topic.creation.default.replication.factor: 3
        topic.creation.default.partitions: 6
    heartbeatConnector:                                          
      config:
        heartbeats.topic.replication.factor: 3
    checkpointConnector:                                         
      tasksMax: 6                                                
      config:                                                   
        sync.group.offsets.enabled: "true"                         
        sync.group.offsets.interval.seconds: 10                  
        checkpoints.topic.replication.factor: 3                  
        replication.policy.class: "org.apache.kafka.connect.mirror.IdentityReplicationPolicy"
        offset-syncs.topic.location: target                      
    topicsPattern: "testphenixcorpsscy" 

3) check source and target lag:

[testbootstrap] ./bin/kafka-consumer-groups.sh --bootstrap-server testbootstrap:9092 --group scydebug --describe;date

Consumer group 'scydebug' has no active members.

GROUP           TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
scydebug        testphenixcorpsscy 3          5               5               0               -               -               -
scydebug        testphenixcorpsscy 2          10              10              0               -               -               -
scydebug        testphenixcorpsscy 4          6               6               0               -               -               -
scydebug        testphenixcorpsscy 1          25              25              0               -               -               -
scydebug        testphenixcorpsscy 0          0               0               0               -               -               -
scydebug        testphenixcorpsscy 5          11              11              0               -               -               -
Mon Jan 29 07:52:06 UTC 2024

[kafka@test-kafka-0 kafka]$ ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group scydebug --describe;date

Consumer group 'scydebug' has no active members.

GROUP           TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
scydebug        testphenixcorpsscy 3          5               5               0               -               -               -
scydebug        testphenixcorpsscy 2          10              10              0               -               -               -
scydebug        testphenixcorpsscy 4          6               6               0               -               -               -
scydebug        testphenixcorpsscy 1          25              25              0               -               -               -
scydebug        testphenixcorpsscy 0          0               0               0               -               -               -
scydebug        testphenixcorpsscy 5          11              11              0               -               -               -
Mon Jan 29 07:52:30 UTC 2024

4) produce, consume and check lag on source

[testbootstrap] ./bin/kafka-console-producer.sh --bootstrap-server testbootstrap:9092 --topic testphenixcorpsscy
>a
>b
>c
>[testbootstrap] ./bin/kafka-console-consumer.sh --bootstrap-server testbootstrap:9092 --topic testphenixcorpsscy --group scydebug
a
b
c
^CProcessed a total of 3 messages
[testbootstrap] ./bin/kafka-consumer-groups.sh --bootstrap-server testbootstrap:9092 --group scydebug --describe;date

Consumer group 'scydebug' has no active members.

GROUP           TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
scydebug        testphenixcorpsscy 3          5               5               0               -               -               -
scydebug        testphenixcorpsscy 2          10              10              0               -               -               -
scydebug        testphenixcorpsscy 4          6               6               0               -               -               -
scydebug        testphenixcorpsscy 1          25              25              0               -               -               -
scydebug        testphenixcorpsscy 0          3               3               0               -               -               -
scydebug        testphenixcorpsscy 5          11              11              0               -               -               -
Mon Jan 29 08:09:05 UTC 2024

5) check lag on target

[kafka@test-kafka-0 kafka]$ ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group scydebug --describe;date

Consumer group 'scydebug' has no active members.

GROUP           TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
scydebug        testphenixcorpsscy 3          6               10              4               -               -               -
scydebug        testphenixcorpsscy 2          11              18              7               -               -               -
scydebug        testphenixcorpsscy 4          7               12              5               -               -               -
scydebug        testphenixcorpsscy 1          26              47              21              -               -               -
scydebug        testphenixcorpsscy 0          1               3               2               -               -               -
scydebug        testphenixcorpsscy 5          12              21              9               -               -               -
Mon Jan 29 08:15:28 UTC 2024

Expected behavior

lag on same consumergroup on target must be 0, it is not.

Strimzi version

0.38.0

Kubernetes version

1.27.7-gke.1121000

Installation method

Yaml files => kustomize

Infrastructure

GKE

Configuration files and logs

MM2 object is ready:
NAME                                                              DESIRED REPLICAS   READY
kafkamirrormaker2.kafka.strimzi.io/mm2-gcp-to-test-scydebug   6                  True

all connectors tasks are RUNNING

Additional context

what am i missing ?

cytar commented 7 months ago

I send a bottle of my best french champagne anywhere in the world, to anyone solving this problem in the next 4 days ;)