logstash-plugins / logstash-output-kafka

Kafka Output for Logstash
Apache License 2.0
75 stars 76 forks source link

1.4.2 vs 1.5.0 kafka output performance regression #20

Closed jordansissel closed 8 years ago

jordansissel commented 9 years ago

(This issue was originally filed by @colinsurprenant at https://github.com/elastic/logstash/issues/2899)


I am creating this issue here so we can better track it for the 1.5.0 release process. If we conclude that this is in fact a plugin-only regression then we'll move it to the proper logstash-*-kafka repo.

We got reports of a performance drop with the kafka output between logstash 1.4.2 and 1.5.0-rc2. The user reported using the logstash-kafka 0.6.2 plugins with logstash 1.4.2. I was in fact able to reproduce with:

vs

vs

using the following config:

input {
 generator { count => 3000000 }
}

output {
  stdout{codec => dots}
  kafka {
    topic_id => "test-topic"
    compression_codec => "snappy"
    request_required_acks => 1
    serializer_class => "kafka.serializer.StringEncoder"
    request_timeout_ms => 10000
    producer_type => 'async'
    message_send_max_retries => 5
    retry_backoff_ms => 100
    queue_buffering_max_ms => 5000
    queue_buffering_max_messages => 10000
    queue_enqueue_timeout_ms => -1
    batch_num_messages => 1000
  }
}

using this command:

USE_RUBY=1 bin/logstash --quiet -f kafka.conf  | pv -Wbart > /dev/null
Version Rate
1.4.2+0.6.2 2.86MiB 0:01:50 [26.6kiB/s] [26.6kiB/s]
1.4.2+0.7.4 2.86MiB 0:02:22 [20.5kiB/s] [20.5kiB/s]
1.5.0 2.86MiB 0:02:23 [20.4kiB/s] [20.4kiB/s]

We see that the regression actually occurs between version 0.6.2 and 0.7.4 in logstash 1.4.2.

@joekiller @talevy can you confirm this?! Thoughts? If we agree this regression is somewhere in the plugin, we can move this issue in the proper repo.

joekiller commented 9 years ago

I'll say that I began testing this locally and found it difficult to differentiate much in the performance of the systems while running everything together, is Kafka, zookeeper and logstash. I'll just say that my results were inconclusive.

I began a AWS Cloudformation template to test a little more isolated or, being how the cloud is, at least somewhat repeatable. You can see the beginnings at the repo linked but it isn't near completion yet.

https://github.com/joekiller/logstash-cloudbenchmark

joekiller commented 8 years ago

@talevy @jordansissel This issue is a little crusty. Opinions?

talevy commented 8 years ago

we can close this, since it does not necessarily apply to the latest 2.x version