Open NitinHsharma opened 6 months ago
Hi any suggested solution for now?
actually for now, fetch and commit manually
go func() {
for {
msg, err := ki.Reader.FetchMessage(context.Background())
if err != nil {
vlog.Errorf("Error reading message: %v", err)
continue
}
// TO-DO
// Commit the message after processing
if err := ki.Reader.CommitMessages(context.Background(), msg); err != nil {
vlog.Errorf("Error committing message: %v", err)
}
}
}()
I am also experiencing the same issue and I am unaware of its cause
@NitinHsharma do you have a reproducer? or is there any factor you saw that causes this to happen more often?
@nachogiljaldo No it is random. and one more observation i saw today is if i have less consumer pods than partition then single consumer pod is taking multiple partition to read. But it is only consuming single partition continosualy since there is continuoes traffic on the kafka topic. So my 1 partition lag is getting increase till i forcefully add one more consumer pod. Example i have topic with 10 partition and have 9 consumer pods so any one random consumer pod let's say consumer pod number 8 is taking 2 partition with it but reading from only 1 parition. My expectation and understanding is that it should read from both in round robin manner to distribute the load.
Just for confirmation, do you think this could be potentially related to rebalances? (i.e. there is a rebalance with a pending async commit that sets it to an offset older than the one you had?), something like this: https://github.com/segmentio/kafka-go/issues/1308
Yes it could be
We are using ReadMessage function with Consumer group, which works pretty good. but sometime one of the partition offset is getting set ahead of it commited message/s so those in between messages are getting stuck in kafka. No reader is able to get those message until we are restarting the pods basically forcefully rebalancing the consumer group.
Below are the basic code which we are using to consume the messages
Below are the logs for the same
Now if you see the logs at the end, it library has commited 79 offset on partition no 5 but somehow it moved to 80. which is causing this lag at kafka with 1 message.