Open nachogiljaldo opened 4 months ago
An alternative solution would be that if the offset goes back in time (because another consumer) the consumer that is assigned to it receives that event again, so it has the chance to reprocess and acknowledge it. That one seems better to me because it should be race-condition free.
Ok, I believe I found the culprit and the solution could come from 2 places.
There's 2 problems the way I see it: a) reader does not verify the partitions that are going to be committed against the generation's Assignments. This opens the gate for a race-condition between the consumer that received a partition and commits pending to be done in the old generation. b) That would not be a big deal if the offset was not cached at the connection level. But because it keeps as offset the last offset it has seen, if the offset "goes back in time" due to a) the connection is not aware of and it does not re-read the messages, which appear as lagging until new events are sent / processed and committed.
Describe the bug
When commits are not immediate and/or some events can have a relatively high processing time and topics substain a low traffic, it can lead to lag being reported even if events were processed and committed.
Kafka Version
3.6.x
To Reproduce
This test reprocuces the behavior: https://github.com/nachogiljaldo/kafka-go/blob/do_not_commit_offset_of_not_owned_partitions/reader_test.go#L1184
The situation is:
Expected Behavior
There are 2 things I would expect:
Observed Behavior
We do not get the missing message which leads to a fake lag when the traffic has little traffic.