rakutentech / kafka-firehose-nozzle

Forward logs from the Cloud Foundry Firehose to Apache Kafka
MIT License
13 stars 8 forks source link

Try sending messages on a different partition if a partition is unavailable #11

Closed tcnksm closed 8 years ago

tcnksm commented 8 years ago

Right now the nozzle will try to write to only available partitions (because we're using sarama's round-robin policy, and this policy will only send to available partitions). Unfortunately in the interval between when a partition leader fails to when sarama detects it as unavailable messages for the partition will queue up inside sarama and sarama by itself won't reroute them to a different partition (because sarama guarantees the per-partition ordering of messages).

There's a proposal to fix it upstream but it looks like it's not going anywhere. We can easily workaround the issue by retrying failed messages up to a few times, so that they should end up on different, available partitions. The messages will be out-of-order but they contain a timestamp and source information, so the order can be restored if needed.

tcnksm commented 8 years ago

Only concern is producerErr.Msg.Partition = 0.

LGTM :)

CAFxX commented 8 years ago

@tcnksm we need to change one more thing; the PR behaves as intended (see the tests) but unfortunately it relies on the hidden state of the message object. We need to change the PR to recreate the message object during repartitioning.

tcnksm commented 8 years ago

This part ?

type ProducerMessage struct {
     ...

     retries int
     flags   flagSet
 }
tcnksm commented 8 years ago

Tested on our internal env and it worked. Merge