grantneale / kafka-lag-based-assignor

Kafka partition assignor that distributes lag evenly across a consumer group
Apache License 2.0
12 stars 5 forks source link

Option to shuffle list of consumers #4

Open theY4Kman opened 4 years ago

theY4Kman commented 4 years ago

I've got a cluster of logstash nodes, each with a number of threads running a Kafka consumer. Our topic is split evenly across this cluster, such that every consumer only receives one partition assignment.

When using the LagBasedPartitionAssignor, what I've noticed is that the consumers with the lexicographically lowest client IDs are assigned the partitions with the highest lag first, all the way down the line... so my logstash nodes that are lexicographically last receive the partitions with the tiniest lag.

I presume what's happening is that Java's sorting is stable, and the list of consumers is returned in lexicographically sorted order.

I was wondering if there could be an option to shuffle the list of consumers before assigning partitions, so the partitions with the highest lags could be spread throughout my cluster.

grantneale commented 4 years ago

When using the LagBasedPartitionAssignor, what I've noticed is that the consumers with the lexicographically lowest client IDs are assigned the partitions with the highest lag first, all the way down the line... so my logstash nodes that are lexicographically last receive the partitions with the tiniest lag.

I presume what's happening is that Java's sorting is stable, and the list of consumers is returned in lexicographically sorted order.

That is correct.

I was wondering if there could be an option to shuffle the list of consumers before assigning partitions, so the partitions with the highest lags could be spread throughout my cluster.

This is certainly possible. The naive appropach is to simply shuffle the consumer order before assigning partitions. This has the downside of making the partition assignment completely unstable. A stable and shuffled assignment is also possible, with a little more work.

I'm curious though. What is the downside to consumers with highest lag being ordered lexographically?