twmb / franz-go

franz-go contains a feature complete, pure Go library for interacting with Kafka from 0.8.0 through 3.6+. Producing, consuming, transacting, administrating, etc.
BSD 3-Clause "New" or "Revised" License
1.61k stars 158 forks source link

reverted offset commit in kafka #689

Closed arjunnair1997 closed 3 months ago

arjunnair1997 commented 4 months ago

The setup:

  1. Single producer writing to a kafka topic with 1 partition.
  2. Single consumer consuming from the kafka topic as part of a consumer group.
  3. BlockRebalanceOnPoll config is set.
  4. Disable auto commit is set.

The consumer workload has the consumer polling 1 record at a time, committing the offset associated with the record, and then making modifications to some in memory state in accordance with the polled record. I want a record to be consumed at most once, and I don't want the committed offset to revert.

However, the situation I'm worried about is as follows:

  1. Poll 1 record at offset 0.
  2. Call CommitRecords with the polled record and some timeout. The CommitRecords times out, but the client has already made this request and it is inflight.
  3. Call CommitRecords again with the polled record. The CommitRecords succeeds.
  4. Modify in memory state machine with data from the polled record.
  5. Poll 1 record at offset 1.
  6. Call CommitRecords again with the polled record. The CommitRecords succeeds.
  7. Modify in memory state machine with data from the polled record.
  8. Now the timed out request from step 2, which was inflight and delayed, makes it to the broker and commits offset 0 again. Reverting the committed offset at 1.

Is this behaviour prevented by franz-go or even the overall kafka protocol? cc @twmb

twmb commented 3 months ago

I think the scenario you describe is theoretically possible, even though technically it's extremely unlikely (borderline impossible).

There's nothing in an OffsetCommit request that strictly can enforce advancing commits. Theoretically, Kafka could receive a request, and then the thread processing that request could hang for an hour while the client times out, commits on a different thread successfully, consumes, and commits again. The original thread could eventually unhang and rewind things.

However realistically, I've never even considered this as a worry because it's so implausible.

I think the only way this could be prevented is if OffsetCommit had an epoch field that was required to strictly advance; it does not.

Not sure if this really helps your scenario but it does answer your question. Sorry for the delay. Anything I could do to help here?

twmb commented 3 months ago

Let me know if there's more here, closing for now.