Open dmvk opened 8 years ago
you are right. there is a bottleneck run operation. Ideally we should make the run section do a process per input thread. Optimizing this side will help with the other 2.2 enhancements: https://www.elastic.co/guide/en/logstash/current/upgrading-logstash-2.2.html
proof of concept:
https://github.com/davidmoravek/logstash-input-kafka/commit/533c5c0a178894127326241600169ea70b2070bf
it definitely needs more work, but it is enough for my use case
Very nice @davidmoravek. I added some comments. I think this is a positive change.
@davidmoravek any plans for a PR?
+1 for a PR. Looks good.
Probably within next week...
We have decided to drop logstash in favor of our internal project. Will anyone continue working on this, or should I close the issue?
@davidmoravek I implemented your ideas in #79 we'll close this issue once it is finished. Thank you for the insight and contributions.
Seems this issue can be resolved.
First of all I'm not a ruby / logstash expert, so I might be missing something.
When I set consumer_threads = 3, it will add three consumers to consumer group (triggers rebalance) and we should get better troughput.
However, it seems to me, that message parsing, still runs in a single thread, no matter how many consumer threads we have.
https://github.com/logstash-plugins/logstash-input-kafka/blob/master/lib/logstash/inputs/kafka.rb#L149
Deserialization is the most expensive operation here, therefore we wouldn't be able to scale up a single logstash instance ...
Yes, there is still an option to scale out, but if I'm right, this is obvious bottleneck, which can be easily fixed.
Am I missing something?
Thanks, D.