Closed blindspotbounty closed 1 year ago
It's a tricky one. We should definitely avoid the error somehow, but what do we do about the offsets? It would be very unexpected to me if you iterated the messages and then next time iterate them again because we haven't committed the offset. We should probably wait until all messages have been consumed and then transition to finished
. cc @felixschlegel
TL;DR It is ok to store message offsets when we are in the state .finishing
, because we receive no new messages after triggerGracefulShutdown
and all changes will be committed before closing completely.
librdkafka
does not fetch any new messages after we invoke triggerGracefulShutdown
(so librdkafka
consumer close) and enqueues RD_KAFKA_OP_TERMINATE
on its local queue. Once the RD_KAFKA_OP_TERMINATE
operation is reached and enable.auto.commit
(aka isAutoCommitEnabled
) is true
, librdkafka
commits all offsets. Therefore it is ok to store message offsets when we are in the state .finishing
, because we receive no new messages after triggerGracefulShutdown
and all changes will be committed before closing completely.
Here is my analysis on this topic:
(Stacktrace for when triggerGracefulShutdown
is invoked)
triggerGracefulShutdown
-> RDKafkaClient.consumerClose
-> rd_kafka_consumer_close_queue
->
...
-> rd_kafka_cgrp_terminate
-> enqueue RD_KAFKA_OP_TERMINATE
on librdkafka
's internal replyq
(Backtrace for how final offsets are committed on consumer termination)
librdkafka
s replyq
processed in rd_kafka_cgrp_op_serve
rd_kafka_cgrp_op_serve
-> case RD_KAFKA_OP_TERMINATE
-> rd_kafka_cgrp_terminate0
-> rd_kafka_assignment_serve
-> rd_kafka_assignment_serve_removals
-> (rd_kafka_toppar_op_fetch_stop
) + rd_kafka_cgrp_assigned_offsets_commit
(See implementation)
Addendum:
As it turns out, still consuming and storing the offset after triggerGracefulShutdown
was invoked can result in a RD_KAFKA_RESP_ERR__STATE
error being triggered here:
This is due to the partition being unassinged already (caused by the shutdown process).
Conclusion: I would say we allow reading and storing offsets in while in the .finishing
state, though at the risk of throwing.
Thank you for the fix!
If client still iterating through async sequence but
triggerGracefulShutdown()
was called, it may lead to fatal error:Unlike for
consumptionStopped
, I believe it should be allowed to continue iterate on graceful shutdown until end of current sequence.