wurstmeister / kafka-docker

Dockerfile for Apache Kafka
http://wurstmeister.github.io/kafka-docker/
Apache License 2.0
6.89k stars 2.73k forks source link

[Bug]kafka auto log clean not working #704

Open xiddjp opened 2 years ago

xiddjp commented 2 years ago

Hi folks,

One problem I encountered is that Kafka's log files will continue to grow and will not be cleared automatically.

I used KAFKA_LOG_RETENTION_MS and KAFKA_LOG_RETENTION_BYTES in docker-compose file.

Are there any problems with these docker configs?

kafka1: restart: always image: wurstmeister/kafka:2.13-2.6.0 ports:

......

xiddjp commented 2 years ago

nobody meet this issue?

TBragi commented 2 years ago

Are you referring to kafka's own log files? (server.log, controller.log, log-cleaner.log, etc..) Or are you referring to the topic logs? (requests-N, tb_transport.api.requests-N)

I ran a test with your configurations, and the messages i produced to the topic "requests" got deleted after 60000ms as expected:

[2022-03-16 13:33:10,738] INFO [ProducerStateManager partition=requests-0] Writing producer snapshot at offset 2 (kafka.log.ProducerStateM
anager)
[2022-03-16 13:33:10,748] INFO [Log partition=requests-0, dir=/kafka/data] Rolled new log segment at offset 2 in 24 ms. (kafka.log.Log)
[2022-03-16 13:33:10,749] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment LogSegment(baseOffset=1, size=132, lastModifie
dTime=1647437528000, largestRecordTimestamp=Some(1647437529695)) due to retention time 60000ms breach based on the largest record timestam
p in the segment (kafka.log.Log)
[2022-03-16 13:33:10,754] INFO [Log partition=requests-0, dir=/kafka/data] Incremented log start offset to 2 due to segment deletion (kafk
a.log.Log)
[2022-03-16 13:34:10,755] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment files LogSegment(baseOffset=1, size=132, lastM
odifiedTime=0, largestRecordTimestamp=Some(1647437529695)) (kafka.log.Log)
[2022-03-16 13:34:10,759] INFO Deleted log /kafka/data/requests-0/00000000000000000001.log.deleted. (kafka.log.LogSegment)
xiddjp commented 2 years ago

Are you referring to kafka's own log files? (server.log, controller.log, log-cleaner.log, etc..) Or are you referring to the topic logs? (requests-N, tb_transport.api.requests-N)

I ran a test with your configurations, and the messages i produced to the topic "requests" got deleted after 60000ms as expected:

[2022-03-16 13:33:10,738] INFO [ProducerStateManager partition=requests-0] Writing producer snapshot at offset 2 (kafka.log.ProducerStateM
anager)
[2022-03-16 13:33:10,748] INFO [Log partition=requests-0, dir=/kafka/data] Rolled new log segment at offset 2 in 24 ms. (kafka.log.Log)
[2022-03-16 13:33:10,749] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment LogSegment(baseOffset=1, size=132, lastModifie
dTime=1647437528000, largestRecordTimestamp=Some(1647437529695)) due to retention time 60000ms breach based on the largest record timestam
p in the segment (kafka.log.Log)
[2022-03-16 13:33:10,754] INFO [Log partition=requests-0, dir=/kafka/data] Incremented log start offset to 2 due to segment deletion (kafk
a.log.Log)
[2022-03-16 13:34:10,755] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment files LogSegment(baseOffset=1, size=132, lastM
odifiedTime=0, largestRecordTimestamp=Some(1647437529695)) (kafka.log.Log)
[2022-03-16 13:34:10,759] INFO Deleted log /kafka/data/requests-0/00000000000000000001.log.deleted. (kafka.log.LogSegment)

I mean the topic logs, that is the data actually stored. There are a lot of topics in my kafka cluster, not only "request" topic. How to ensure that other topics datas will be cleared?

TBragi commented 2 years ago

In your case you create the requests and tb_transport.api.requests topics with specific configurations regarding retention.ms and retention.bytes.

Any other topics will be created with the cluster default settings unless you specify otherwise, and you can use the kafka commands to check these settings on a broker level, or on the specific topic:

https://stackoverflow.com/questions/35997137/how-do-you-get-default-kafka-configs-global-and-per-topic-from-command-line

You could also include a GUI which allows you to easily check the settings of a topic and adjust them if needed, I have had good experience using either Kafdrops or kafka-ui

xiddjp commented 2 years ago

In your case you create the requests and tb_transport.api.requests topics with specific configurations regarding retention.ms and retention.bytes.

Any other topics will be created with the cluster default settings unless you specify otherwise, and you can use the kafka commands to check these settings on a broker level, or on the specific topic:

https://stackoverflow.com/questions/35997137/how-do-you-get-default-kafka-configs-global-and-per-topic-from-command-line

You could also include a GUI which allows you to easily check the settings of a topic and adjust them if needed, I have had good experience using either Kafdrops or kafka-ui Got it, thank you.

TBragi commented 2 years ago

@xiddjp can this issue be closed? 😃

Tbeck-91 commented 1 year ago

In your case you create the requests and tb_transport.api.requests topics with specific configurations regarding retention.ms and retention.bytes.

Any other topics will be created with the cluster default settings unless you specify otherwise, and you can use the kafka commands to check these settings on a broker level, or on the specific topic:

https://stackoverflow.com/questions/35997137/how-do-you-get-default-kafka-configs-global-and-per-topic-from-command-line

You could also include a GUI which allows you to easily check the settings of a topic and adjust them if needed, I have had good experience using either Kafdrops or kafka-ui

how do I create the topics with specific configurations without using the command line?

fanluoo commented 6 months ago

We also encountered the same problem,Expired data on topic cannot be automatically deleted, we kafka version is 2.13-2.5.1. We have attempted to manually clear the data, shorten the retention time of topic, and then restart the broker, it's not work. and then we shorten the log.retention.hours=168 to 48 and manually clear the data, then restart the broker again, this time it takes effect

fanluoo commented 6 months ago

@xiddjp Have you solved this problem or have any more clues?