logstash-plugins / logstash-input-kafka

Kafka input for Logstash
Apache License 2.0
139 stars 122 forks source link

topics_pattern limits parallel consumption of matching topics' partitions #340

Closed Djeezus closed 2 years ago

Djeezus commented 2 years ago

Logstash information:

Please include the following information:

  1. Logstash version : logstash v7.14
  2. Logstash installation source : docker pull docker.elastic.co/logstash/logstash-oss:7.14.1-amd64
  3. How is Logstash being run: Openshift OSE3.x|4.x replicationset
  4. How was the Logstash Plugin installed : default included in Docker image

JVM (e.g. java -version):

sh-4.2$ logstash --version Using bundled JDK: /usr/share/logstash/jdk warning: no jvm.options file found logstash 7.14.0

OS version (uname -a if on a Unix-like system): sh-4.2$ uname -a Linux logstash 3.10.0-1127.el7.x86_64 #1 SMP Tue Feb 18 16:39:12 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior: The topics_pattern is an excellent feature, but it has introduced an issue when it comes to consumer parallelism I think. It seems that the "topics_pattern" will match a number of Kafka-topics but it doesn't take into account horizontal scaling of logstash-instances.

I have a setup where my "topics_pattern" matches currently 2 topics, and each topic has 2 partitions ...

It seems though as if the "topics_pattern" is actually limiting the parallelism, and assigning somehow the consumer-role to an (arbitrary?) Logstash-instance ?

Steps to reproduce:

  1. create 2 kafka topics with eacht 2 partitions : test-topic-aa test-topic-bb

  2. create 4 logstash instances and define in Kafka input topics_pattern => "test-topic.*"

  3. produce some messages on the topics, and spawn 4 logstash instances ...

  4. you will see that only 2 logstash instances are consuming from the in total 4 partitions Provide logs (if relevant):

Djeezus commented 2 years ago

It's not a bug, it's because the default "partition_assignment_strategy" is "Range" .. When explicitly defining "cooperative_sticky", the consumer_group instances get distributed out over all the topics' partitions

Djeezus commented 2 years ago

Maybe the default parameter for "partition_assignment_strategy" can be changed to "cooperative_sticky", and that way automatically leveraging the consumer.group spreading when using topics_pattern and/or multiple topics.