salesforce / storm-dynamic-spout

A framework for building spouts for Apache Storm and a Kafka based spout for dynamically skipping messages to be processed later.
BSD 3-Clause "New" or "Revised" License
41 stars 13 forks source link

More fair partition distribution with weird number of partitions #87

Closed stanlemon closed 6 years ago

stanlemon commented 6 years ago

The current partition distribution implementation attempts to distribute the maximum number of partitions to each consumer, such that when the number of consumer instances is not a multiplier of the partition count the ending instances are left with no partitions. For example consider 4 partitions across 3 instances, the current implementation will give instances 1 & 2 two partitions each and leave instance 3 with no partitions.

The new implementation attempts to make sure every instance has at least one partition, unless there are more instances than partitions, and will distribute up to the max partition per instance number to as many instances as it can. The trick here is that we still want the partitions ordered such that with 4 partitions across 3 instances, instance 1 should have partitions 0 and 1, and instance 2 should have partition 2 and instance 3 partition 4. The complexity here is easier to see when you go up to 20 instances and 35 partitions as an example. The current implementation would leave the last two instances empty, when it reality we want them each to have one.

stanlemon commented 6 years ago

This will need to be included in 0.9.x

stanlemon commented 6 years ago

@crim can you review this please?

Crim commented 6 years ago

I thought mim's had it covered. I don't have the back story to this, or understand whats changing. Tests seem to pass so probably good?

stanlemon commented 6 years ago

Note that after merging I will rebase the squashed commit directly onto the 0.9 branch.