Open Tiihott opened 2 weeks ago
In this use-case another more practical way to achieve the proper tracking of consumed offsets is to leverage the idempotent consumer implementation, which stores the consumed offset data to the HDFS filenames. Thus it would be possible to implement the listener in this way solving the issue:
Solved in beta branch PR #41
Description By registering a ConsumerRebalanceListener to a consumer in the consumer group, the listener can be used for tracking the record offsets inside a batch. The listener must be initialized after consumer has been initialized but before the consumer has subscribed to a topic. The listener object must also be passed to BatchDistributionImpl object for when it starts processing the record batch, allowing the listener to track the offsets that have been stored to HDFS. Once a kafka rebalance happens, the listener is used to clean up any remaining records from previous batches and store them to HDFS. Offset commits are also updated during this process, so the new consumer that has been assigned to the topic after rebalance knows where to start reading the topic.