Nice project, I just came here from the hackernews discussion and noticed this comment you made.
"I've also thought about reserving a topic + consumer group specifically for failed jobs and bake the retry logic into KQ itself. But that's an area I must explore more."
At my work, we have made a less sophisticated version of kq -- and this retry logic has been baked in by us, so I thought I would describe it, as it works out quite well for us.
1) Message is consumed.
2) If exception happens send it to a retry topic named like this {original-topic-name}.{consumer-group}.retry we have the consumer group in the retry topic name because there are usually multiple consumers consuming off the same topic. Also add a retry_count into the message.
3) At an interval defined by the operator, read all the messages back off the retry topic, if they succeed, that's great, if they fail send it back to the retry topic but increment the retry_count by one (I should note, all of our messages are JSON encoded)
4) Keep reading messages, but if a message has over N retry_counts, instead of sending it back to the retry topic, we send it to a dead letter topic {original-topic-name}.{consumer-group}.dlt which usually gets picked up by an operator.
This was super simple to implement, and has worked out great for us so far (we originally came up with a much more complicated solution, but it sucked)
Hi,
Nice project, I just came here from the hackernews discussion and noticed this comment you made.
"I've also thought about reserving a topic + consumer group specifically for failed jobs and bake the retry logic into KQ itself. But that's an area I must explore more."
At my work, we have made a less sophisticated version of kq -- and this retry logic has been baked in by us, so I thought I would describe it, as it works out quite well for us.
1) Message is consumed.
2) If exception happens send it to a retry topic named like this {original-topic-name}.{consumer-group}.retry we have the consumer group in the retry topic name because there are usually multiple consumers consuming off the same topic. Also add a retry_count into the message.
3) At an interval defined by the operator, read all the messages back off the retry topic, if they succeed, that's great, if they fail send it back to the retry topic but increment the retry_count by one (I should note, all of our messages are JSON encoded)
4) Keep reading messages, but if a message has over N retry_counts, instead of sending it back to the retry topic, we send it to a dead letter topic {original-topic-name}.{consumer-group}.dlt which usually gets picked up by an operator.
This was super simple to implement, and has worked out great for us so far (we originally came up with a much more complicated solution, but it sucked)
Thanks,
Ben