kafkaex / kafka_ex

Kafka client library for Elixir
MIT License
596 stars 162 forks source link

Feature Request: Distributed Consumers #472

Closed BlueCollarChris closed 2 years ago

BlueCollarChris commented 2 years ago

There are instances where having distributed consumers of the same consumer group amongst many nodes would be beneficial for heavy load consumers to spread their computing needs amongst more than a single node. Im not entirely sure on the lift or the implementation of a feature like this but wanted to start a discussion on this topic.

BlueCollarChris commented 2 years ago

If this is already achievable with the current library I would be interested in knowing or someone documenting this, as I was unable to determine from reading through the codebase and some playing around with the consumers.

dantswain commented 2 years ago

Hi @kempt09 I don’t believe any of the processes in the current implementation are global, so you should be able to accomplish this just by starting consumer group members on each node, unless I’m misunderstanding your request?

joshuawscott commented 2 years ago

Kafka handles coordination within a consumer group; the only thing needed to make this happen is using the same consumer group name (and topic and brokers of course) between servers.

BlueCollarChris commented 2 years ago

So I think that is where maybe I am just confused with the library on starting a bunch of consumers within the same group that each have their own partition they consume from. It looks like using the KafkaEx.ConsumerGroup is probably hiding some of the ability for me to direct the consumers directly to be spread across some nodes? Wishing it was as simple as knowing this topic has n partitions and just start n consumers all registered to the same group and assign them a partition but allow them to be globally registered consumers so they can be spread out with something like Horde and using process rebalancing. I could have over looked it though in the code and docs. Maybe this is achievable without needing the KafkaEx.ConsumerGroup.

dantswain commented 2 years ago

@kempt09 I’m not following. The implementation of KafkaEx.ConsumerGroup is per the Kafka consumer group spec - there’s nothing really hidden, that’s just how a consumer group is supposed to work. The Kafka broker itself handles all of the coordination and monitoring. It will automatically distribute the partitions across the consumer group and will automatically handle situations like crashes or scale up/down. KafkaEx basically just says “here I am, ready to consume zero or more partitions for topic X” and lets the broker tell it which partitions it should consume.

That said, there’s really nothing that would stop you from implementing something like you described (iiuc). You can basically just start a process for each partition and manage the assignments yourself. Having said that, that’s what we used to do before we implemented the consumer group spec and having used both I would never go back ;)

BlueCollarChris commented 2 years ago

I see the convenience of the consumer group spec for sure. We have been using it and its nice and real easy to get up and running. We ran into an issue where our consumers are doing some intense work but as we add more data need some more cpu power. Imma play around for a bit in the code and see if I can pick up what your putting down :D

BlueCollarChris commented 2 years ago

@dantswain must have really overlooked something on my end, got it working as you mentioned. Sorry to bother y'all. Thanks for the help in pointing me in the right direction.