single threaded parsing?

logstash-plugins / logstash-input-kafka

Kafka input for Logstash

Apache License 2.0

139 stars 122 forks source link

Open dmvk opened 8 years ago

dmvk commented 8 years ago

First of all I'm not a ruby / logstash expert, so I might be missing something.

When I set consumer_threads = 3, it will add three consumers to consumer group (triggers rebalance) and we should get better troughput.

However, it seems to me, that message parsing, still runs in a single thread, no matter how many consumer threads we have.

Deserialization is the most expensive operation here, therefore we wouldn't be able to scale up a single logstash instance ...

Yes, there is still an option to scale out, but if I'm right, this is obvious bottleneck, which can be easily fixed.

Am I missing something?

Thanks, D.

joekiller commented 8 years ago

you are right. there is a bottleneck run operation. Ideally we should make the run section do a process per input thread. Optimizing this side will help with the other 2.2 enhancements: https://www.elastic.co/guide/en/logstash/current/upgrading-logstash-2.2.html

dmvk commented 8 years ago

proof of concept:

it definitely needs more work, but it is enough for my use case

joekiller commented 8 years ago

Very nice @davidmoravek. I added some comments. I think this is a positive change.

joekiller commented 8 years ago

@davidmoravek any plans for a PR?

suyograo commented 8 years ago

+1 for a PR. Looks good.

dmvk commented 8 years ago

Probably within next week...

dmvk commented 8 years ago

We have decided to drop logstash in favor of our internal project. Will anyone continue working on this, or should I close the issue?

joekiller commented 8 years ago

@davidmoravek I implemented your ideas in #79 we'll close this issue once it is finished. Thank you for the insight and contributions.

sandervandegeijn commented 8 months ago

Seems this issue can be resolved.