sclasen / akka-kafka

185 stars 62 forks source link

Consumer with a Router #34

Closed salex89 closed 9 years ago

salex89 commented 9 years ago

Does it make sense to extend the consumer with a router setting, i.e round robin? Because where I see it, we may need multiple instances of the same consumer if the processing logic allows it. For example some processing that is not quick and separate messages can be separately processed, so we might have 2+ consumers with the same group.

So, is it sensible to implement this, knowing Kafka and Akka principles?

sclasen commented 9 years ago

So, the ActorRef that you use to receive messages from the AkkaConsumer can certainly be a router, not sure if that is what you mean.

If you mean having the AkkaConsumer itself be a pool behind a router, then Im not sure that will buy you much. Each actor in the router would have to own a subset of the streams, due to the way the underlying kafka connector and commits of offsets work.

Lets take the example of a topic with 20 partitions. If you create one consumer with 20 streams or 2 consumers with 10 streams or 4 consumers with 5 streams, I am not sure you will see any appreciable difference in throughput.

Each of those consumers will be sending to your actorRef which is almost certainly more dominant in latency than the consumer(s) themselves. You might try writing a benchmark to (dis)prove this to yourself. (If this is what you meant by consumer having a router)

Which one do you mean?

salex89 commented 9 years ago

My idea is to increase throughput by consuming more messages in parallel. To be honest, I thought about the second one, the consumer being a router. Twelve hours ago that made sense in my head. Now I understand that it makes more sense making the ActorRef receiving messages from the consumer a router. But will that work considering the back-pressure? Will the consumer send a message to the next one in the round-robin pool before it receives the StreamFSM.Processed from the previous?

sclasen commented 9 years ago

The consumer will send (number of streams * maxInFlight per stream) messages before waiting for a StreamFSM.Processed.

So if you have a pool that goes slower than the consumer (almost certianly you will) then each actor in the pool will have several messages in its mailbox. The rate at which your pool send StreamFSM.Processed will match the rate at which you are able to consume.

Just to clarify, you are talking about the non-batch AkkaConsumer yes?

On Tuesday, April 21, 2015, Aleksandar Stojadinovic < notifications@github.com> wrote:

My idea is to increase throughput by consuming more messages in parallel. To be honest, I thought about the second one, the consumer being a router. Twelve hours ago that made sense in my head. Now I understand that it makes more sense making the ActorRef receiving messages from the consumer a router. But will that work considering the back-pressure? Will the consumer send a message to the next one in the round-robin pool before it receives the StreamFSM.Processed from the previous?

— Reply to this email directly or view it on GitHub https://github.com/sclasen/akka-kafka/issues/34#issuecomment-94937148.

salex89 commented 9 years ago

Yes, about the non-batch consumer :) . Thank you for the clarification. Maybe a google group or something should exist for these questions and/or discussions, I don't feel nice when opening issues for all sorts of things :) .

sclasen commented 9 years ago

Hey no worries about opening issues, seems fine to me.