Closed shashankmehra closed 7 years ago
HI @shashankmehra,
Thanks pointing this out. I am aware of the issue, and I am going to add the auto_offset_reset
parameter to the worker class to at least allow the users to control the offset behaviour themselves. I would rather leave the default to latest
than earliest
because of the problems you pointed out. I also plan to make it very clear in the documentation that workers must be started before jobs are queued.
Added this feature in https://github.com/joowani/kq/releases/tag/1.3.0. Closing issue.
If an offset is lost completely due to an out of range error, KafkaConsumer has an attribute
auto_offset_reset
which controls what to do in that scenario. By default this is set tolatest
. This would mean that all enqueued tasks (both processed and unprocessed) before that point would be lost.Usually this should not be a problem, but while testing a completely new topic, I started the producer but I did not start a consumer. After a couple of tasks were queued, the consumer was started. On start, the consumer remained idle and did not process the tasks already queued. But I could see the tasks queued in kafka through kafkacat.
Once I queued a task again, while consumer was running, the consumer started processing but only processed the task just queued. Thereafter, tasks enqueued while the consumer was down were not lost and 'normal' functioning resumed.
Would it make sense to set
auto_offset_reset
toearliest
? There are obvious drawbacks. The queue length might be too long for offset to be reseted toearliest
. The tasks written by the developer might not be properly idempotent. It might be preferable to lose a few tasks in case of offset errors (I dont know any other cases where they might appear but there might be).In either case I wanted to document here that worker must be started before producer for any new deployment (or any new topic) for the offset to come in sync. Otherwise tasks queued before the worker boots up will not be processed, if not lost.