mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.39k stars 529 forks source link

kafka consumer: Replica infomation not available, one or more brokers are down. #1787

Open bobrik opened 9 years ago

bobrik commented 9 years ago

hekad.toml:

[KafkaInput]
type = "KafkaInput"
topic = "mytopic"
offset_method = "Newest"
partition = 1
group = "heka"
addrs = ["mykafka:9092"]

[LogOutput]
type = "LogOutput"
message_matcher = "TRUE"
encoder = "PayloadEncoder"

Heka 0.10.0b1.

I have a feeling that Sarama library should be updated to make it work.

JozoVilcek commented 8 years ago

+1 Rather week support for kafka is a major complication for me to largely adopt heka. Brokers can be down from time to time but consumers / producers for replicated topics should not be broken by this.

dmvk commented 8 years ago

Are there any news on this issue? Producers definitely should keep working when new partition leader gets elected on another broker after the crash...

dmuth commented 8 years ago

I too am having this problem. I have a 7 node Kafka cluster with a replication factor of 3, yet if I shut down a single Kafka instance, I get this error. This behavior happens whether I have one, a few, or all of the IPs of machines in my Kafka cluster listed in the configuration.

That's not the sort of behavior I expect when interfacing with a messaging system that has multiple replicas.

Should we maybe file a bug with the auth of Sarama as well?

-- Doug

dewrich commented 8 years ago

I just hit this as well, seems like a big problem with resiliency. Any other "hacks" to get around this is appreciated.

dmuth commented 8 years ago

I found a workaround.

I ended up installing Apache Nifi (https://nifi.apache.org/) and I set up a web service to listen on it which then forwards to Kafka. One thing Nifi does not do is read logfiles, which Heka is actually pretty good at.

So, to summarize, here is the workflow:

Logfile(s) on disk -> Heka -> Webservice provided by Nifi -> Kafka

(If anyone has questions about Nifi, feel free to hit me up)

I feel a little frustrated about this whole thing, because Heka works fine in all other aspects. But having this misbehaving library is really affecting the usefulness and overall quality of Heka. :-(

davidbirdsong commented 8 years ago

Kafka support for anything non-jvm is a tough nut to crack. Shopify's been very diligent with this library, but it's changed a ton over the last 2 years. Keeping heka up-to-date with it takes a significant work.

Can any of you offer some time and devel work to bring heka up-to-date w/ sarama? If not, can you find an example go project(consumer, producer is easy by comparison) on github that's implemented and tracked w/ sarama over the last 2 years that could provide as inspiration for a PR?

nickchappell commented 8 years ago

@davidbirdsong @dmuth @dewrich I chatted with @trink and @rafrombrc in #heka (the Mozilla one, not freenode) a few days ago about this. Updating Heka to use the newest lua_sandbox will probably take care of this. The newer versions of the sandbox have https://github.com/edenhill/librdkafka included, and librdkafka supports both Kafka 0.8 and 0.9 features (consumer groups, the new consumer API in 0.9, etc.).

There's a Bugzilla issue for updating Heka's lua_sanbox: https://bugzilla.mozilla.org/show_bug.cgi?id=1262555

bobrik commented 8 years ago

Whoa, there is another bug tracker for heka.

rohit01 commented 8 years ago

We are also facing this issue. Any luck?