wvanbergen / kafka

Load-balancing, resuming Kafka consumer for go, backed by Zookeeper.
MIT License
373 stars 141 forks source link

Plans for this repository #47

Closed lukanus closed 9 years ago

lukanus commented 9 years ago

Hey!

Seeing that you are one of the biggest contributors to Shopify/sarama repository right now, should I expect this code to be merged at some point with sarama's main?

I'm using sarama at production and I'm in need to implement consumer groups. I can probably even contribute to this repo a bit - just I would like to do it for repository having some perspectives for future support :).

I see there is another bsm repository, but for what I see in comments, You have no plans in getting this as a part of Sarama.

Also one more thing. Do you plan on getting possibility to pass already created consumers/partition consumers to join consumer group? This should make at least immediate states of upgrade from sarama and this lib, much easier.

wvanbergen commented 9 years ago

The answer is: eventually, once I am happy with the implementation. While this library definitely is functional, there are some things I'd like to address first:

If you are willing to help out on these, that would be great! /cc @eapache

What do you mean with "created consumers/partition consumers to join consumer group"? Porting the offset into Zookeeper, so it will resume at the same point?

lukanus commented 9 years ago

I'm using Sarama for quite a while and I see some area for improvement there ;) One of such is for example a structure wrapping all consumers for topic. Right now you've done it as part of last update Consumer/Partition consumer.

I wonder how you plan to solve the Kafka management of offsets? Write your own implementation? I seen that in libraries like https://github.com/SOHU-Co/kafka-node/ you're giving just a zookeeper connection string. Even going to Kafka wiki https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example It's describing purpose of Consumer Groups as using zookeeper for that. Or am I missing something?

As for joining consumer groups I meant to after creation of consumer group you may attach already working partition consumers onto the group, rather than having this functionality inside right now. https://github.com/wvanbergen/kafka/blob/master/consumergroup/consumer_group.go#L134 https://github.com/wvanbergen/kafka/blob/master/consumergroup/consumer_group.go#L348 Something like func (cg *ConsumerGroup) AddPartitionConsumer(consumer *PartitionConsumer) {}

For example how do you see functionality 'I want to this consumer to consume every topic using X number of partition consumers'. Right now you need to create X number of partition consumers and inject them somehow into consumer group, right? Also I can create more partitions during live of application.

Also I believe very helpful would be adding some tools like for topic management into sarama(here?) for brokers(?) giving you ability to create new partitions as kafka command-line toolset does.

wvanbergen commented 9 years ago

Zookeeper is not really meant to write offsets to very often, which is why I currently use an interval of 10 seconds to flush the latest offsets to ZK. To replace those calls, I am working on implementing the Kafka offset management API calls in Sarama: https://github.com/Shopify/sarama/pull/379. After that, Zookeeper is only needed to do coordination between different instances of the same consumer group.

I don't think you can attach a working PartitionConsumer to a group, because you first have to claim the partition in Zookeeper before you can start consuming. The consumergroup will ensure there is exactly one PartitionConsumer for every topic/partition. That means an instance may consume multiple partitions if there are more partition to consume than there are running instances (the normal case), or that some instances will do nothing if you start more instances than partitions. (We actually have an example of the second case where we have two instances both trying to consume a topic with a single partition. One instance will consume it, while to other is on "hot standby".

What definitely could use improvement is when new partitions are added to a topic in Kafka. Right now, we only list the partitions during startup, and we will not detect any newly created partitions until you restart the consumer group. It's not a very common scenario, but I'd like to handle that better.

wvanbergen commented 9 years ago

In Kafka 0.9, the entire protocol needed to run a high-level consumer is moving to Kafka, and will not require Zookeeper anymore. We plan to support this model in sarama.

The zookeeper-based consumer as implemented in this repository will at least be maintained until then, and will probably stick around for people that are not ready to move to the new Kafka version or consumer implementation.