strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.8k stars 1.28k forks source link

[Question] Kafka data auto deletes every one day (using default strimzi configurations) #3679

Closed alok87 closed 4 years ago

alok87 commented 4 years ago

Recently started using Kafka. I understand Kafka has two retention configuration, by time and by size. I have the Kafka cluster running with default configurations, with data stored in EBS volumes. I even extended the retention to 168 hours but still Kafka keeps eating my data every 1d.

Global Config

sh-4.2$ pwd
/opt/kafka/config
sh-4.2$ grep retention *
server.properties:log.retention.hours=168
server.properties:# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
server.properties:# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
server.properties:#log.retention.bytes=1073741824
server.properties:# to the retention policies
server.properties:log.retention.check.interval.ms=300000

Only time based retention of 10days is there, but Kafka still deletes my data every 1day.

I also saw there was no topic level retention set. It was using the global value.

Topic Config

sh-4.2$ ./bin/kafka-configs.sh --bootstrap-server localhost:9094 --entity-type topics --entity-name inventory.inventory.customers --describe --all
All configs for topic inventory.inventory.customers are:
  compression.type=producer sensitive=false synonyms={DEFAULT_CONFIG:compression.type=producer}
  leader.replication.throttled.replicas= sensitive=false synonyms={}
  min.insync.replicas=1 sensitive=false synonyms={DEFAULT_CONFIG:min.insync.replicas=1}
  message.downconversion.enable=true sensitive=false synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true}
  segment.jitter.ms=0 sensitive=false synonyms={}
  cleanup.policy=delete sensitive=false synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete}
  flush.ms=9223372036854775807 sensitive=false synonyms={}
  follower.replication.throttled.replicas= sensitive=false synonyms={}
  segment.bytes=1073741824 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.bytes=1073741824}
  retention.ms=864000000 sensitive=false synonyms={}
  flush.messages=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807}
  message.format.version=2.5-IV0 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.message.format.version=2.5, DEFAULT_CONFIG:log.message.format.version=2.5-IV0}
  max.compaction.lag.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.max.compaction.lag.ms=9223372036854775807}
  file.delete.delay.ms=60000 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000}
  max.message.bytes=1048588 sensitive=false synonyms={DEFAULT_CONFIG:message.max.bytes=1048588}
  min.compaction.lag.ms=0 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0}
  message.timestamp.type=CreateTime sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime}
  preallocate=false sensitive=false synonyms={DEFAULT_CONFIG:log.preallocate=false}
  index.interval.bytes=4096 sensitive=false synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096}
  min.cleanable.dirty.ratio=0.5 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5}
  unclean.leader.election.enable=false sensitive=false synonyms={DEFAULT_CONFIG:unclean.leader.election.enable=false}
  retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
  delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
  segment.ms=604800000 sensitive=false synonyms={}
  message.timestamp.difference.max.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807}
  segment.index.bytes=10485760 sensitive=false synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760}

kafka cluster

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: k8s
spec:
  kafka:
    version: 2.5.0
    replicas: 3
    listeners:
      external:
        type: nodeport
        tls: false 
        overrides:
          bootstrap:
            nodePort: 31234
          brokers:
          - broker: 0
            nodePort: 31235
          - broker: 1
            nodePort: 31236
          - broker: 2
            nodePort: 31237
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "2.5"
    storage:
      class: "gp2"
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 300Gi
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 300Gi
        deleteClaim: false
  zookeeper:
    replicas: 3
    storage:
      class: "gp2"
      type: persistent-claim
      size: 300Gi
      deleteClaim: false
  entityOperator:
    template:
      pod:
        nodeSelector:
          dedicated: ss
          kubernetes.io/os: linux
        tolerations:
        - effect: NoSchedule
          key: dedicated
          operator: Equal
          value: ss
    topicOperator: 
      template:
        pod:
          nodeSelector:
           dedicated: ss
           kubernetes.io/os: linux
          tolerations:
          - effect: NoSchedule
            key: dedicated
            operator: Equal
            value: ss
    userOperator:
      template:
        pod:
          nodeSelector:
            dedicated: ss
            kubernetes.io/os: linux
          tolerations:
          - effect: NoSchedule
            key: dedicated
            operator: Equal
            value: ss

Why is Kafka eating my data? :( Help me debug this please. Where can I see the logs of cleanup and find the reason of deletion in the logs?

alok87 commented 4 years ago

Is the replication broken? broker0

sh-4.2$ pwd
/var/lib/kafka/data-0/kafka-log0
sh-4.2$ ls -tlra | grep inventory
drwxrwsr-x  2 kafka root  4096 Sep 20 02:28 inventory.inventory.geom-0
drwxrwsr-x  2 kafka root  4096 Sep 20 02:28 inventory.inventory.addresses-0

broker1

sh-4.2$ pwd
/var/lib/kafka/data-0/kafka-log1
sh-4.2$ ls -tlra | grep inventory
drwxr-sr-x  2 kafka root  4096 Sep 20 02:01 schema-changes.inventory-0

All the directories in broker0 should get replicated to broker1, right?

alok87 commented 4 years ago

Closing this not related to strimzi for sure https://www.reddit.com/r/apachekafka/comments/iw7725/help_kafka_eats_all_the_data_24_hours_with/

sathwikreddygv commented 3 years ago

delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} What does this mean here? This might be the reason for 1 day deletions

scholzj commented 3 years ago

That means that the tombstone messages indicating key deletion in compacted topics are deleted after 86400000ms.