segmentio / kafka-go

Kafka library in Go
MIT License
7.57k stars 786 forks source link

[savaki/kafka-go/consumer-groups] Discussion #44

Closed achille-roussel closed 5 years ago

achille-roussel commented 6 years ago

@savaki

I'm opening this issue as communication thread on the consumer groups implementation that you're working on there https://github.com/savaki/kafka-go/tree/consumer-groups

First of all, thanks a lot of taking time to work on this, it's a highly demanded feature that I'm sure a lot will benefit from.

Here are some feedback I had on your work:

savaki commented 6 years ago

Thanks for opening this thread. Here was my thinking on each of those:

achille-roussel commented 6 years ago
savaki commented 6 years ago
achille-roussel commented 6 years ago

When I look at the Kafka api, especially wrt consumer groups, it views the world as a set of topics with their associated partitions. kafka-go on the other hand views the world as topic partition pairs. I noticed in putting together consumer groups the conflict between these perspectives. It comes front and center during rebalancing.

Like I said, I'm not against supporting multiple topics on the Reader type, it may even be a great improvement of the current code. I was just saying that as a first step and to reuse as much of the existing code as possible we could start with single topics, then improve the code to be fully featured.

Is compatibility with other consumers desired? e.g. can I gradually co-mingle kafka-go and sarama consumers using consumer groups?

I believe this is a requirement, companies have programs written in multiple languages or using multiple libraries. As library writers we can build much more powerful tools if we offer cross-compatibility instead of building something new. This is also what's recommended by the Kafka documentation:

The primary use case for the membership API is consumer groups, but the requests are intentionally generic to support other cases (e.g. Kafka Connect groups). The cost of this generality is that specific group semantics are pushed into the client. For example, the JoinGroup/SyncGroup requests defined below have no explicit fields supporting partition assignment for consumer groups. Instead, they contain generic byte arrays in which assignments can be embedded by the consumer client implementation.

But although this allows each client implementation to define its own embedded schema, compatibility with Kafka tooling requires clients to use the standard embedded schema used by the client shipped with Kafka. The consumer-groups.sh utility, for example, assumes this format to display partition assignments. We therefore recommend that clients should follow the same schema so that these tools will work for all client implementations.

What is the expected behavior if the client uses the same consumer group id for multiple topics partitions? For example, if we assume a consumer group aware Reader, what would be the expected behavior if one service has two Readers that use the same consumer group id, but point to different topic-partitions?

Requiring that one group equals one topic is fine as a first step if that means less public API changes and a simpler implementation. I've never seen groups being used to load balance across topics (I'm sure there are use cases for it but it's not common in my experience).

What are your thoughts on using a custom rebalance strategy?

Let's focus on one implementation first, then once we have it working we can add an abstraction for supporting custom strategies. In my experience, it's always been easier to first solve one problem, then step back and think about the generic solution. It also means we'll already have tests to ensure that as we introduce an abstraction we won't be breaking existing code.

savaki commented 6 years ago

These answers help a lot. The reason I ask about compatibility vs multiple topics has to do with how consumer groups rebalance. Conceptually Kafka consumers handle rebalancing as follows:

  1. Find the broker that will coordinate the group
  2. Attempt to join the group. As part of the request, I pass the list of rebalancing strategies I support. The standard-ish ones are roundrobin and range. bsm/sarama-cluster supports these.
  3. The coordinator will (a) select a common strategy among the consumers and return that as part of join group and (b) select a leader among the consumers attempting to join in
  4. The leader is soley responsible for deciding which consumers will be responsible for which topic partitions. The important thing is that a single leader handles assignments for all consumers across all topic partitions that share a common group id
  5. The leader pushes its assignments to the coordinator which in turn forwards them to the other consumers which then subscribe to the appropriate topic partitions

The challenge I see is thus. A single leader is responsible for assignments across all topic-partitions, not just a single topic partition pair.

Consequently, if we had a heterogeneous environment with both sarama cluster and kafka-go, if the leadership were given to a sarama cluster consumer, the kafka-go consumer may not be able to handle the assignment given e.g. that both the topic and partition should be changed.

Conversely, if the kafka-go consumer were elected leader, without a custom strategy, it would be unaware of which topic-partitions each of the existing consumers were bound to. A workaround for this would be to create a custom rebalancing strategy. However, if we create a custom strategy, then there would be no common strategy between sarama-cluster consumers and kafka-go consumers and new consumers would get an InconsistentGroupProtocol error.

Hence, my thinking in allowing ConsumerGroup to support multiple topic partitions. I considered making the change to Reader, but ran into the following:

achille-roussel commented 6 years ago
savaki commented 6 years ago

I like that first suggestion a lot. I'll go ahead and start there.

Partitions[] shouldn't be in the configuration. It was something I experimented with, but clearly didn't remove completely before committing.

savaki commented 6 years ago

I noticed that ReaderStats has an embedded partition. How would you like to handle that if the Reader can handle more than one partitions. Options I see would include:

or a combination of the above.

achille-roussel commented 6 years ago

That's a good point. I think it's fine to set the partition to -1 for now when the reader is used with a consumer group. It's not ideal but I'm not a big fan of adding new methods for solving this.

Ideally I think we should return a slice of ReaderStats, one of each topic/partition that the reader handles, but this is an API breaking change so I'd rather delay this until we have consumer groups fully implemented and maybe we'll make a major version release where we can introduce those kind of changes.

savaki commented 6 years ago

sounds good to me

savaki commented 6 years ago

At this point, I have a working version of consumer groups with kafka-go.

As you've suggested, I split the work into smaller commits to make it easier to digest. The first commit, expand Connto handle kafka api calls, increases the surface area of *Conn to support the api calls required by consumer groups. I've intentionally made the calls shallow so the functions could be a simple adapter to Kafka.

In a separate commit, I can introduce the code that modifies Reader to support consumer groups.

I've updated the link above, https://github.com/savaki/kafka-go/tree/consumer-groups, to point to the working version. Still has a number of open issues and needs more tests, but should serve as a good straw man for discussion.

Some of the open issues include:

achille-roussel commented 6 years ago

What to do with (*Reader).Offset and (*Reader).SetOffset

Could Offset and SetOffset simply return errors if the Reader is configured to be part of a consumer group? Because in this case offset management is automatic so the program cannot be given control over where the Reader is supposed to start.

Should the Reader hold a persistent connection to the coordinator, or just create one on demand

I'd advocate for getting a working implementation first (no matter what approach you choose), but if connections occur quite often we'll probably want to keep a persistent connection established to the coordinator.

Should any attempt be made to buffer commits? Currently I'm using a naive implementation that opens a connection, writes the commit, and closes the connection

Same answer as the previous question, let's aim for a simple and working solution first, no matter how inefficient, then we can improve it.

savaki commented 6 years ago

Could Offset and SetOffset simply return errors if the Reader is configured to be part of a consumer group? Because in this case offset management is automatic so the program cannot be given control over where the Reader is supposed to start.

+1

I'd advocate for getting a working implementation first (no matter what approach you choose), but if connections occur quite often we'll probably want to keep a persistent connection established to the coordinator.

Agreed. I have a working implementation that we're testing internally. However, I'm breaking up the check-ins as per our discussion to ease the review process. As it stands, not reusing connections was simpler to reason about so I went that direction. The main driver of a persistent connection is the commit which currently opens a new conn on commit.

Same answer as the previous question, let's aim for a simple and working solution first, no matter how inefficient, then we can improve it.

Agreed. I'm coming from the pov that I have a working version currently and have capacity to look at additional issues. I'll make sure to submit these changes we're discussing as separate pull requests to not muddy the waters.

My suggestion is that we set a Config time.Duration, CommitInterval. When zero, commits are pushed synchronously with no buffering. When > 0, commits will be buffered and pushed at the specified interval. Close on the Reader will push any buffered commits to the broker.

errnoh commented 6 years ago

Could Offset and SetOffset simply return errors if the Reader is configured to be part of a consumer group? Because in this case offset management is automatic so the program cannot be given control over where the Reader is supposed to start.

(Not directly about Offset() or SetOffset(), mostly about ReaderConfig) While I really like the ease of use of the current implementation this is my main issue with it. Without having the support to start the consumer group from latest offset I'll have to keep using separate consumers for each partition.

Use case here is having temporary consumer group read latest data from a topic. With data possibly containing terabytes and retention of couple weeks starting from beginning each time is not ok.

achille-roussel commented 6 years ago

That’s a good pint, would you have the time to work on adding this feature?

errnoh commented 6 years ago

I think I can find some time soonish, and I know what to change. Just need to decide where to expose the configuration for this.

achille-roussel commented 6 years ago

The reader config seems like the right place for this, but feel free to suggest something else if you think otherwise.

abraithwaite commented 6 years ago

@errnoh, I might do some work on this soon. Have you written anything?

errnoh commented 6 years ago

Yeah, sorry. A bit busy at work but my proof-of-concept is now at https://github.com/segmentio/kafka-go/pull/88

Any opinion if that's the way we should do it or do you have any other suggestions?

mariusw commented 6 years ago

Just so I understand: kafka-go supports consumer groups, but not entirely (or, natively)? There has been an effort to implement it better, but that stopped in March? We really want to use (and contribute to) this library rather than Sarama (with the sarama-cluster lib for consumer groups), but reading the open issues I get the feeling kafka-go is not quite there yet? @achille-roussel @Pryz @abraithwaite

achille-roussel commented 5 years ago

Catching up here, @mariusw consumer group support in kafka-go should be usable and we are using this code in production at Segment, so you should be able to as well.

If there are features that you think are missing feel free to open a discussion about it or a PR if you have a fix ready to be merged.

Regarding this issue, I think that all the work that it tracked has landed or has other issues/PRs tracking it, so I'll go ahead and close it.

Thank you all for your contributions in the discussions and the code!