mailgun / kafka-pixy

gRPC/REST proxy for Kafka
Apache License 2.0
768 stars 119 forks source link

Are existing consumers affected by Set Offset operations? #163

Closed atamon closed 5 years ago

atamon commented 5 years ago

I've been experimenting with the kafka-pixy API a bit in the past few days, great work! It reduces kafka setup immensely :)

I have a use-case where I want to be able to rewind consumption of a topic using the same group.

This is the sequence in which I'm calling the API

GET /topics/${topic}/messages?group=${group} # Until all existing messages are read
POST /topics/${topic}/offsets?group=${group} # request body = {partition: 0, offset: 0, metadata: '' }
GET /topics/${topic}/offsets?group=${group} # Returns offset 0 as expected
GET /topics/${topic}/messages?group=${group} # Gets 408 as long-polling times out, I expected offset 0 to be read again.

A hackish yet simple workaround I found is waiting for the consumer group to time out. Checking the output of /topics/${topic}/consumers for the group I'm after and when that reports no consumers, continue with consumption.

Another solution is of course to not reuse consumer groups, but that feels like leaving a lot of junk behind.

Is it the expected outcome that existing consumers won't be affected by set offset operations?

horkhe commented 5 years ago

Thank you @atamon, we are glad that you liked the project. You are right existing consumers are not affected by the Set Offset API and the workaround that you suggested is correct. Internally the offset is read from Kafka once on consumer initialization, after that Kafka-Pixy only writes updated offsets back to Kafka. Changing this behavior would greatly complicate the implementation so we are not going to do that.

Speaking about replay in general I can suggest:

atamon commented 5 years ago

@horkhe Thanks for the quick answer!

Our use-case (providing a key-vaule store for the latest value of each key in the topic) would require us to replay all/most of the topic, so I think your first suggestion is a great idea. We want to be able to have quick service-restarts during development, automatic testing and in-case of production crashes. I believe using two dedicated consumer groups per "service" will help us avoid this issue.

Cheers