Closed matthinograb closed 4 years ago
In summary, we're only calling the first comparator in the chain, and don't use the BaseReplicaMovementStrategy's comparator.
We also need to make sure that dynamically changing the strategy forces a chain with the BaseReplicaMovementStrategy.
Details:
The chained replica movement strategies class is a composite class. When applying strategy, it calls applyStrategy of its internal variable named
current
. Since thecurrent
is a separate object, it doesn't use an overriddentaskComparator
method of the chained strategies class (parent object).
From https://github.com/linkedin/cruise-control/issues/1127:
The comparators defined in custom replica movement strategies may return equal (0) fairly often. For example, consider tasks that move partitions with no data, or tasks that both move non-under-replicated partitions, etc.
This becomes problematic, because we rely on the uniqueness of these tasks when inserting into the TreeSet that represent the execution plans. add()
ing an element that is equal to one already contained in the TreeSet will return false
, causing the IllegalStateExceptions.
A TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. https://docs.oracle.com/javase/8/docs/api/java/util/TreeSet.html
In ExecutionTaskPlannerTest.java
, Change _partitionMovement1
's partition size from 4 to 1, then re-run the tests. This should cause the size-based ReplicaMovementStrategies to throw an IllegalStateException, because there are now 2 partition movements that are "equal" for the Priorize Large/Small Partition Movement Strategies.
master
(it was only merged to the kafka 0.11 and 1.0 branch) should fix the IllegalStateException for the testGetInterBrokerPartitionMovementTasks
. This fixes the proper chaining of the comparators.testDynamicConfigReplicaMovementStrategy
testsWhen testing the PrioritizeLargeReplicaMovementStrategy
and after changing _partitionMovement1
's partition size from 4 to 1, I expected the tasks to return a list of movements sorted by partition size, then execution id. Any idea why that might be the case?
Expected order:
_partitionMovement1
: size 1 _partitionMovement4
: size 1_partitionMovement3
: size 2_partitionMovement2
: size 3But I ended up getting:
_partitionMovement1
: size 1, id: 0_partitionMovement3
: size 2, id: 2_partitionMovement4
: size 1, id: 3_partitionMovement2
: size 3, id: 1Stepping through and turning on Debug logging during the inter-broker partition movement rebalance algorithm was helpful to answer this question. In summary, my expectations were not correct, and the algorithm seems to be working as designed.
The priority of accepting a movement is:
Round 1
_partitionMovement1
: size 1, movement: 0 --> 1. Move first because smallest size, and smallest execution ID._partitionMovement4
: size 1, movement: 2 --> 0. Skipped, because broker 0 already involved._partitionMovement3
: size 2, movement: 2 --> 3. Add to list, since no brokers involved._partitionMovement2
: size 3, movement: 1 --> 2. Skipped, because broker 1 and 2 are already involved.After round 1, execution list: _partitionMovement1
, _partitionMovement3
Round 2
_partitionMovement4
: size 1, movement: 2 --> 0. Added to list._partitionMovement2
: size 3, movement: 1 --> 2. Added to list.After round 2, execution list: _partitionMovement1
, _partitionMovement3
, _partitionMovement4
, _partitionMovement2
.
I think the round-robin strategy taking top precedence is a potential issue, because it does not always result in the global optimal ordering of movements.
Instead of doing the round-robin as the first priority, would it be reasonable to use the sorted list of interbroker movements as a top priority, and then do round-robin based on that? Are there any implications to doing so?
Example:
Suppose we want to use the PrioritizeLargePartitionMovementStrategy
, and we have three brokers in our readyBrokers
set: 1, 2, and 3, with concurrency set to 1. Suppose we want to perform the following movements:
In theory, we would want to move the largePartition from broker 2 first, given the PrioritizeLargePartitionMovementStrategy
. However, we will actually move the small partition, given the current algorithm implementation.
Round-robin will choose to select movements for broker 1, since it is first in the set of brokers*
I think this is a bit of an extreme example, but it highlights the potential problems when using round-robin broker strategy as the highest priority.
*HashMaps do not guarantee ordering, but I think insertion order is somewhat preserved. If the insertion order is not preserved, then we must assume some randomness. If there is randomness, then we may still randomly fall into this scenario.
Resolving, as bug was fixed by https://github.com/linkedin/cruise-control/pull/1184
Received an error while using the
PrioritizeSmallReplicaMovementStrategy
. Not sure if this related to https://github.com/linkedin/cruise-control/issues/1127.CC Version:
2.0.96
Request:
Error Code: