joowani / kq

Kafka-based Job Queue for Python
http://kq.readthedocs.io
MIT License
572 stars 24 forks source link

Messages lost on offset out of range #6

Closed shashankmehra closed 7 years ago

shashankmehra commented 7 years ago

If an offset is lost completely due to an out of range error, KafkaConsumer has an attribute auto_offset_reset which controls what to do in that scenario. By default this is set to latest. This would mean that all enqueued tasks (both processed and unprocessed) before that point would be lost.

Usually this should not be a problem, but while testing a completely new topic, I started the producer but I did not start a consumer. After a couple of tasks were queued, the consumer was started. On start, the consumer remained idle and did not process the tasks already queued. But I could see the tasks queued in kafka through kafkacat.

Once I queued a task again, while consumer was running, the consumer started processing but only processed the task just queued. Thereafter, tasks enqueued while the consumer was down were not lost and 'normal' functioning resumed.

Would it make sense to set auto_offset_reset to earliest? There are obvious drawbacks. The queue length might be too long for offset to be reseted to earliest. The tasks written by the developer might not be properly idempotent. It might be preferable to lose a few tasks in case of offset errors (I dont know any other cases where they might appear but there might be).

In either case I wanted to document here that worker must be started before producer for any new deployment (or any new topic) for the offset to come in sync. Otherwise tasks queued before the worker boots up will not be processed, if not lost.

joowani commented 7 years ago

HI @shashankmehra,

Thanks pointing this out. I am aware of the issue, and I am going to add the auto_offset_reset parameter to the worker class to at least allow the users to control the offset behaviour themselves. I would rather leave the default to latest than earliest because of the problems you pointed out. I also plan to make it very clear in the documentation that workers must be started before jobs are queued.

joowani commented 7 years ago

Added this feature in https://github.com/joowani/kq/releases/tag/1.3.0. Closing issue.