robertluo / waterfall

The Unlicense
16 stars 1 forks source link

Reduce the frequency of commits #27

Closed JackSho closed 2 weeks ago

JackSho commented 9 months ago

Purpose

Waterfall consumes from the topic and only commits once after processing all the consumed data. In this case, if there is a lot of data in the topic, commits will be very frequent.

In our usage, the data of the internal topic __commit_offsets is many times larger than the data of the normal business topic. It does not seem to affect the kafka's work yet, but it is not sure when the data of __commit_offsets continues to grow, Will it affect the kafka's work.

截屏2023-12-05 16 18 45
chenjianye commented 1 month ago

最近的情况是:在我们的使用场景中,有很多的消费者,topic消费量是生产量的很多倍,这时消费者的提交次数非常多。目前 topic __commit_offsets 每秒的写入消息数量大约为其他所有业务 topic 总和的 50 倍,写入消息字节数大约为其他所有业务 topic 总和的 15 倍。

这个 topic 写磁盘的速度很快。kafka的性能设计中有一点,是使用磁盘缓存来加速消费响应速度的,如果topic __commit_offsets 的消息写入过多,磁盘缓存里能装的业务topic消息就相应地少了很多。从另一个角度看,如果 topic __commit_offsets 的写入量能降低到与其他 topic 一样的数量级,那么可以在性能不变的情况下使用小很多的内存配置,可以降低很多成本支出。

参考:kafka clients 自动提交的处理,提交是以周期时间为间隔的,该时间间隔是个可以配置的参数,默认为5秒。参考链接:https://docs.confluent.io/platform/current/clients/consumer.html#offset-management-configuration

监控指标参考:

截屏2024-08-21 13 03 20 截屏2024-08-21 13 03 50