Closed tcnksm closed 8 years ago
Only concern is producerErr.Msg.Partition = 0
.
LGTM :)
@tcnksm we need to change one more thing; the PR behaves as intended (see the tests) but unfortunately it relies on the hidden state of the message object. We need to change the PR to recreate the message object during repartitioning.
This part ?
type ProducerMessage struct {
...
retries int
flags flagSet
}
Tested on our internal env and it worked. Merge
Right now the nozzle will try to write to only available partitions (because we're using sarama's round-robin policy, and this policy will only send to available partitions). Unfortunately in the interval between when a partition leader fails to when sarama detects it as unavailable messages for the partition will queue up inside sarama and sarama by itself won't reroute them to a different partition (because sarama guarantees the per-partition ordering of messages).
There's a proposal to fix it upstream but it looks like it's not going anywhere. We can easily workaround the issue by retrying failed messages up to a few times, so that they should end up on different, available partitions. The messages will be out-of-order but they contain a timestamp and source information, so the order can be restored if needed.