mailgun / kafka-pixy

gRPC/REST proxy for Kafka
Apache License 2.0
773 stars 118 forks source link

Add a config option to disable Consumer API entirely for Produce only environments #72

Closed salekseev closed 6 years ago

evan-stripe commented 6 years ago

I'm interested in taking a stab at implementing this - we're currently running a local fork of kafka-pixy that just has consumer support patched out entirely, but a config option is clearly a more sustainable approach.

@horkhe It'd be helpful to get some input on the design prior to starting implementation. My current thinking is:

It seems like everything else that access (*proxy.T).consumer checks first to see if it's nil (and returns a reasonable error), so I think that should be sufficient.

Does that seem like a reasonable strategy?

horkhe commented 6 years ago

Could you please explain what is the problem with consumer enabled? I would say, If you do not need it just do not use it. The only overhead of having a not needed consumer is that it maintains connections with Kafka brokers and Zookeeper whereas computational resources (goroutines/data structs) are allocated on demand.

evan-stripe commented 6 years ago

Hmm, I think there are a few reasons from our perspective (although I can't speak to what the original requestor's requirements are). The most straightforward is security: we don't give producer-only instances network access to our Zookeeper cluster. That's a fairly hard line we're not willing to cross - our fleet of producers is significantly larger than the set of instances that need to talk to Zookeeper today.

There's another point that's somewhat more subtle - we want to offer as few supported interfaces as possible for interfacing with Kafka to the rest of the organization, and enabling consumers in kafka-pixy opens up a new interface that we're not prepared to support today. (Right now we only support consumers written against the JVM, using the native client libraries)

horkhe commented 6 years ago

It saddens me that the most sophisticated part of Kafka-Pixy, that I am considering its killer feature, is being cast aside. But bitter feelings aside I see your point and find it valid. You can go ahead and implement your plan, I am ready to accept the proposed changes.

evan-stripe commented 6 years ago

That's a totally fair perspective. For what it's worth, I hope you don't think that we're throwing aside you're work; we're very heavily invested in Kafka-Pixy and I don't expect that to change.

I think our current perspective on consumers is as much a reflection on Kafka as it is on Kafka-Pixy specifically - in general, we have a lot of producers and relatively few consumers (the majority of our Kafka-driven applications today are ETL/Hadoop based, rather than online applications), so we're being very cautious in how we roll out consumer logic. If our use of consumers continues to expand, I suspect we'll take another look at Kafka-Pixy's consumer side of things. (And I suspect we'll have a whole new set of patches for you then 😛)

Thanks for the feedback, though. I'm hoping to put together a patch for you tomorrow.

horkhe commented 6 years ago

Thank you for the details. I am glad that you find this project useful. By the way, are you using HTTP or gRPC interface?

evan-stripe commented 6 years ago

We're using the HTTP interface today - our original deployment predates Kafka-Pixy's support for GRPC. But we intend to migrate in the medium term.

horkhe commented 6 years ago

And the last question, I promise :). Why not REST Kafka Proxy, you are not using the consumer feature anyway, and you run Java in production? What made you prefer Kafka-Pixy? I am preparing a presentation for a meetup and it would be nice to give real life use cases. Obviously I am not going to mention any names without permission.

evan-stripe commented 6 years ago

No problem 🙂

I wasn't involved in the process for selecting Kafka-Pixy initially, so I'll have to check with folks tomorrow. As best as I can piece together, we wanted to run whatever proxy we used as a sidecar process (on the instances that were actually producing messages), since it generally makes reasoning about failures, latency, etc. easier. It seems like we found the Kafka REST proxy to be fairly heavyweight for that use case, while Kafka-Pixy was significantly lower CPU, memory, etc. overhead. I'll double check and see if that's accurate, though.

horkhe commented 6 years ago

Thank you for your feedback!

evan-stripe commented 6 years ago

(By the way, if you want to drop me an email at some point, I'm happy to answer more questions about how we're using Kafka and Kafka-Pixy. My email address is on my profile)

thrawn01 commented 6 years ago

Very nice, We are also moving toward Kafka-pixy as a sidecar in Kubernetes. I wonder how much work would be involved to get your feature integrated with Kubernetes RBAC, such that if service account has consumer role, then KP would enable the consumer feature. I'm just thinking off the top of my head and don't have much of an implementation in mind.