linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

BrokerSetAwareGoal can not work as expect #1870

Closed Linjianfengccc closed 2 years ago

Linjianfengccc commented 2 years ago

Hello guys, I am using BrokerSetAwareGoal for broker separation, my cluster has 6 brokers, id 1~6, and I divide them into two broker sets with the following config: { "brokerSets":[ { "brokerSetId": "kafkatest00_bg0", "brokerIds": [1,2,3], "doc": "This block contains broker ids that belong to BrokerSet 0" }, { "brokerSetId": "kafkatest00_bg1", "brokerIds": [4,5,6], "doc": "This block contains broker ids that belong to BrokerSet " } ] }

The cluster looks like: image

So when I execute "remove broker 6", I got this error:

ERROR: Error processing POST request '/remove_broker' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [BrokerSetAwareGoal] Cannot move replica Replica[isLeader=false,rack=,broker=3,TopicPartition=__consumer_offsets-0,origBroker=3,isOriginalOffline=false,isCurrentOffline=false] to [Broker[id=4,rack=,state=ALIVE,replicaCount=829,logdirs=[]], Broker[id=5,rack=,state=ALIVE,replicaCount=425,logdirs=[]]] on brokerSet kafkatest00_bg1 Add at least 3 brokers. Add at least 3 brokers.'.

It makes me confused cuz CC was trying to move a partition from broker 3 to broker 4 and 5, they should be in different broker sets and the movement shouldn't be generated from the sense of BrokerSetAwareGoal.

Do you have any suggestion or suspect about this? Thanks in advance cc @mohitpali

mohitpali commented 2 years ago

Would you be able to share your configurations and explain what you are trying to achieve ? Also, are you only using the Hard Goals. BrokerSetAwareGoal only works with Hard Goals at the moment since distribution goals do not understand brokerSets yet.

Linjianfengccc commented 2 years ago

Hi, thanks for reply I upload my CC config in the attachment, you can see the goal config is like following:

The list of goals to optimize the Kafka cluster for with pre-computed proposals -- consider using RackAwareDistributionGoal instead of RackAwareGoal in clusters with partitions whose replication factor > number of racks

default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.BrokerSetAwareGoal

The list of supported goals

goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerDiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerEvenRackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.BrokerSetAwareGoal

The list of supported intra-broker goals

intra.broker.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskUsageDistributionGoal

The list of supported hard goals -- consider using RackAwareDistributionGoal instead of RackAwareGoal in clusters with partitions whose replication factor > number of racks

hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.BrokerSetAwareGoal

I set only two hard goals, DiskCapacityGoal and BrokerSetAwareGoal They are the onlys what i care about, mainly BrokerSetAwareGoal. I think CC won't move partitions across different broker set no matter what execution I have triggered. But when I remove a broker of broker set B, CC tries to move some partitions to BG B from BG A. What's more, when I execute add brokers for which are in BG B, some partitions were moved into it from BG A. So I am confused how can I use this goal correctly

------------------ 原始邮件 ------------------ 发件人: "linkedin/cruise-control" @.>; 发送时间: 2022年7月28日(星期四) 上午10:11 @.>; @.**@.>; 主题: Re: [linkedin/cruise-control] BrokerSetAwareGoal can not work as expect (Issue #1870)

Would you be able to share your configurations and explain what you are trying to achieve ? Also, are you only using the Hard Goals. BrokerSetAwareGoal only works with Hard Goals at the moment since distribution goals do not understand brokerSets yet.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

mohitpali commented 2 years ago

I will have to look into this. In the meanwhile could you please also change the order of the goals and see if this happens.

Also when you said CC "tries" to move partitions from one brokerSet to another BrokerSet when adding or removing broker, did you get Optimization Failure Exception or did it actually move the partitions ?

Linjianfengccc commented 2 years ago

For remove action, I got the Optimization Exception, for add action, the movement was truely executed and the partitions were moved across BG.

In the meanwhile could you please also change the order of the goals and see if this happens.

my issue happens always, do you mean place the BrokerSetAwareGoal as the first one?

Linjianfengccc commented 2 years ago

I tried to change the order of goals in my config but it didn't help, the issue still exists

mohitpali commented 2 years ago

When you added a broker, did you change the brokerSets config to include the new broker ? OR did you remove a broker and then add it back and then the cross movement happened.

I am yet to test this, will keep you posted on this thread.

Linjianfengccc commented 2 years ago

I keep the config unaltered, it is always like "BG1:[1,2,3]; BG2[4,5,6]" no matter execute add broker 4 or remove broker 4, OptimizationFailureException throws indicating the movement across broker set

Linjianfengccc commented 2 years ago

From my understanding, for my current broker ser config, if I execute remove broker 4, all the replicas of it will be migrated to broker 5,6 which are in the same broker group, and no replicas will appear in broker 1~3. Do I make mistakes?

mohitpali commented 2 years ago

Your understanding is correct.

There seems to be something missing. Did you also set anomaly.detection.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.BrokerSetAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal ?

Linjianfengccc commented 2 years ago

I took a try and it is the same issue:

image

what is more, "consumer_offsets" is in my topic movement exclusion: `topics.excluded.from.partition.movement=__consumer_offsets,transaction_state` I have no idea about why CC is trying to move it

Linjianfengccc commented 2 years ago

Also this operation works:

image
mohitpali commented 2 years ago

what is more, "consumer_offsets" is in my topic movement exclusion: topics.excluded.from.partition.movement=__consumer_offsets,transaction_state I have no idea about why CC is trying to move it

This is because offline replicas are always moved and are not covered by exclusion. Read - https://github.com/linkedin/cruise-control/wiki/REST-APIs#fix-offline-replicas-in-kafka-cluster

mohitpali commented 2 years ago

So after making the change, did remove_broker or add_broker successfully move partitions off of a BrokerSet to another BrokerSet ?

The way the Goals work is that DiskCapacityGoal may propose a movement based on disk capacity threshold breach, but since we have configured BrokerSetAwareGoal as well, the BrokerSetAwareGoal should prevent the movement. This is supposed to generate OptimizationFailureException but not literally move.

Linjianfengccc commented 2 years ago

So after making the change, did remove_broker or add_broker successfully move partitions off of a BrokerSet to another BrokerSet ?

Yes, when a add all the brokers, the partitions moved across different brokers sets

mohitpali commented 2 years ago

Just trying to understand since this did not happen to me when i tested. In both of your screenshots, i don't see replicas being moved. If you are seeing OptimizationFailureException, that is fine, because BrokerSetAwareGoal prevented a proposal generation. BrokerSetAwareGoal would have not accepted DiskCapacityGoal's proposal and you would finally not see a proposal being generated. Exception message may be misleading here.

Linjianfengccc commented 2 years ago

Just trying to understand since this did not happen to me when i tested. In both of your screenshots, i don't see replicas being moved. If you are seeing OptimizationFailureException, that is fine, because BrokerSetAwareGoal prevented a proposal generation. BrokerSetAwareGoal would have not accepted DiskCapacityGoal's proposal and you would finally not see a proposal being generated. Exception message may be misleading here.

I am running it with dryrun mode, if i uncheck dryrun, the movement happens

Linjianfengccc commented 2 years ago
image
Linjianfengccc commented 2 years ago

Just trying to understand since this did not happen to me when i tested. In both of your screenshots, i don't see replicas being moved. If you are seeing OptimizationFailureException, that is fine, because BrokerSetAwareGoal prevented a proposal generation. BrokerSetAwareGoal would have not accepted DiskCapacityGoal's proposal and you would finally not see a proposal being generated. Exception message may be misleading here.

If so, it looks like the DiskCapacityAwareGoal is incompatible with BrokerSerAwareGoal or in other words, BrokerSerAwareGoal can just stop execution of a proposal but not adjust it, right?

mohitpali commented 2 years ago

Thanks for providing the info. Would you mind running the addBroker through Command line API with a dry run and verbose = true and paste the output here please ?

Linjianfengccc commented 2 years ago

Sure curl -X POST "http://localhost:9090/kafkacruisecontrol/add_broker?dryrun=true&brokerid=6,5,4,3,1,2&verbose=true"

Optimization has 727 inter-broker replica(12 MB) moves, 0 intra-broker replica(0 MB) moves and 0 leadership moves with a cluster model of 5 recent windows and 100.000% of the partitions covered.
Excluded Topics: [__transaction_state, __consumer_offsets].
Excluded Brokers For Leadership: [].
Excluded Brokers For Replica Move: [].
Counts: 6 brokers 9429 replicas 270 topics.
On-demand Balancedness Score Before (47.619) After(100.000).
Provision Status: RIGHT_SIZED.

[    44 ms] Stats for BrokerSetAwareGoal(FIXED):
AVG:{cpu:       1.747 networkInbound:       0.936 networkOutbound:       1.728 disk:     288.914 potentialNwOut:       3.495 replicas:1571.5 leaderReplicas:575.8333333333334 topicReplicas:5.82037037037037}
MAX:{cpu:       4.422 networkInbound:       1.619 networkOutbound:       4.513 disk:     659.069 potentialNwOut:       5.845 replicas:2697 leaderReplicas:1547 topicReplicas:2000}
MIN:{cpu:       0.000 networkInbound:       0.000 networkOutbound:       0.000 disk:       0.000 potentialNwOut:       0.000 replicas:329 leaderReplicas:154 topicReplicas:0}
STD:{cpu:       1.407 networkInbound:       0.535 networkOutbound:       1.396 disk:     290.970 potentialNwOut:       1.999 replicas:991.3634970752823 leaderReplicas:469.1394308258 topicReplicas:6.265513499128141

[     3 ms] Stats for DiskCapacityGoal(NO-ACTION):
AVG:{cpu:       1.747 networkInbound:       0.936 networkOutbound:       1.728 disk:     288.914 potentialNwOut:       3.495 replicas:1571.5 leaderReplicas:575.8333333333334 topicReplicas:5.82037037037037}
MAX:{cpu:       4.422 networkInbound:       1.619 networkOutbound:       4.513 disk:     659.069 potentialNwOut:       5.845 replicas:2697 leaderReplicas:1547 topicReplicas:2000}
MIN:{cpu:       0.000 networkInbound:       0.000 networkOutbound:       0.000 disk:       0.000 potentialNwOut:       0.000 replicas:329 leaderReplicas:154 topicReplicas:0}
STD:{cpu:       1.407 networkInbound:       0.535 networkOutbound:       1.396 disk:     290.970 potentialNwOut:       1.999 replicas:991.3634970752823 leaderReplicas:469.1394308258 topicReplicas:6.265513499128141

Current load:

          HOST         BROKER          RACK         DISK_CAP(MB)            DISK(MB)/_(%)_            CORE_NUM         CPU(%)          NW_IN_CAP(KB/s)       LEADER_NW_IN(KB/s)     FOLLOWER_NW_IN(KB/s)         NW_OUT_CAP(KB/s)        NW_OUT(KB/s)       PNW_OUT(KB/s)    LEADERS/REPLICAS
xxx.xxx.xxx.xxx,             1,     nihao123,         2000000.000,            659.069/00.03,                  1,         2.523,               10000.000,                   0.876,                   0.479,               10000.000,              1.824,              2.904,          1601/2986
xxx.xxx.xxx.xxx,             2,xxx.xxx.xxx.xxx,         2000000.000,            674.120/00.03,                  1,         5.861,               10000.000,                   1.213,                   1.110,               10000.000,              3.180,              9.342,           762/3412
xxx.xxx.xxx.xxx,             3,xxx.xxx.xxx.xxx,         2000000.000,            400.296/00.02,                  1,         2.101,               10000.000,                   0.661,                   1.279,               10000.000,              5.364,              8.726,          1066/2977
xxx.xxx.xxx.xxx,             4,xxx.xxx.xxx.xxx,         2000000.000,              0.000/00.00,                  1,         0.000,               10000.000,                   0.000,                   0.000,               10000.000,              0.000,              0.000,            11/19
xxx.xxx.xxx.xxx,             5, xxx.xxx.xxx.xxx,         2000000.000,              0.000/00.00,                  1,         0.000,               10000.000,                   0.000,                   0.000,               10000.000,              0.000,              0.000,             9/18
xxx.xxx.xxx.xxx,             6, xxx.xxx.xxx.xxx,         2000000.000,              0.000/00.00,                  1,         0.000,               10000.000,                   0.000,                   0.000,               10000.000,              0.000,              0.000,             6/17

Cluster load after adding broker [1, 2, 3, 4, 5, 6]:

          HOST         BROKER          RACK         DISK_CAP(MB)            DISK(MB)/_(%)_            CORE_NUM         CPU(%)          NW_IN_CAP(KB/s)       LEADER_NW_IN(KB/s)     FOLLOWER_NW_IN(KB/s)         NW_OUT_CAP(KB/s)        NW_OUT(KB/s)       PNW_OUT(KB/s)    LEADERS/REPLICAS
xxx.xxx.xxx.xxx,             1,     nihao123,         2000000.000,            659.069/00.03,                  1,         2.523,               10000.000,                   0.876,                   0.479,               10000.000,              1.824,              2.904,          1547/2676
xxx.xxx.xxx.xxx,             2,xxx.xxx.xxx.xxx,         2000000.000,            654.691/00.03,                  1,         4.422,               10000.000,                   0.945,                   0.674,               10000.000,              1.848,              3.497,           443/2697
xxx.xxx.xxx.xxx,             3,xxx.xxx.xxx.xxx,         2000000.000,            380.868/00.02,                  1,         1.058,               10000.000,                   0.225,                   1.011,               10000.000,              0.851,              2.881,           724/2256
xxx.xxx.xxx.xxx,             4,xxx.xxx.xxx.xxx,         2000000.000,             19.428/00.00,                  1,         1.438,               10000.000,                   0.268,                   0.435,               10000.000,              1.332,              5.845,           279/736
xxx.xxx.xxx.xxx,             5, xxx.xxx.xxx.xxx,         2000000.000,             19.428/00.00,                  1,         1.042,               10000.000,                   0.435,                   0.268,               10000.000,              4.513,              5.845,           308/735
xxx.xxx.xxx.xxx,             6, xxx.xxx.xxx.xxx,         2000000.000,              0.000/00.00,                  1,         0.000,               10000.000,                   0.000,                   0.000,               10000.000,              0.000,              0.000,           154/329

I see the proposal is out of my purpose, violated the broker set separation

image
mohitpali commented 2 years ago
  1. The answer may lie in the existing documentation that says

When adding new brokers to a Kafka cluster, Cruise Control makes sure that the replicas will only be moved from the existing brokers to the provided new broker, but not moved among existing broker

Even though this is true and we add a new broker, the BrokerSetAwareGoal should be honored. I will check with @efeg and @CCisGG as to why is this the case.

  1. Also, curious as to what you are trying to do by adding all brokers 1....6. I would assume a test to reproduce this would be to remove lets say broker 5,6 and then add them back. Why all of them ?
Linjianfengccc commented 2 years ago

I add all the brokers unintentionally, just found it worked and ignored the broker set limitation.. In usual cases, I will do like what you say, remove broker 5, and add it back e.g, but it was stopped by BrokerSetAwareGoal

CCisGG commented 2 years ago

Hi @Linjianfengccc , I feel like the error is expected. If you have 6 brokers and 2 brokerSet, the remove broker is expected to fail if the RF (replication factor) of the topic is 3.

Linjianfengccc commented 2 years ago

Hi @CCisGG , thank you for reply For the second broker set,I confirmed that all the topics have 2 replicas at most

image

For the first broker set, there are truly topics with more than 2 replicas. But I don't think the second broker set movement should be affected by the first one's RF. And I think the point is that when I try to remove a broker from BG2, CC was trying to move replicas across broker sets

image
CCisGG commented 2 years ago

@Linjianfengccc Moving replica across brokerSet is possible. By default, the BrokerSetAwareGoal is using TopicNameHashBrokerSetMappingPolicy.

E.g. it doesn't guarantee a topic stays on the current brokerSet. @mohitpali to confirm.

If you want the topics stay within the brokerSet if it's already in it, you can implement your own ReplicaToBrokerSetMappingPolicy.

mohitpali commented 2 years ago

@CCisGG In this case, the user does not add any additional brokerSets. So, Topic name hashing should always point to same brokerSet.

CCisGG commented 2 years ago

@mohitpali yes, but say before the goal runs, topic is in brokerSet_A, isn't it possible the policy assign the topic to brokerSet_B?

mohitpali commented 2 years ago

If I understand correctly, there are no proposals before doing add_broker or remove_broker ?

One way to look at this is to run the rebalance API without any broker removal or addition.

@CCisGG's suspicion makes sense. The Topic Name determines which brokerSet the topic should be allocated to.

Linjianfengccc commented 2 years ago

If I understand correctly, there are no proposals before doing add_broker or remove_broker ?

One way to look at this is to run the rebalance API without any broker removal or addition.

@CCisGG's suspicion makes sense. The Topic Name determines which brokerSet the topic should be allocated to.

Your understanding is correct. I ran the rebalance, when it is done, the replicas moved and the add/remove broker is rejected as before.

Let me show you the procedure about what I did( means there are replicas in the broker): In the beginning, the cluster has many topics without any rules, and only one the broker set: [1, 2, 3, 4, 5, 6] then i executed to remove 4,5,6 and it worked: [1, 2, 3, 4, 5, 6] After that, I divided the brokers into 2 broker sets [BG1(1, 2, 3*), BG2(4, 5, 6)] The OptimizationException raised when I execute remove 6

So if using TopicNameHashBrokerSetMappingPolicy, the replicas will be moved away due to the name hash rule. But you said that @CCisGG In this case, the user does not add any additional brokerSets. So, Topic name hashing should always point to same brokerSet. I am not sure whether they are in conflict.

mohitpali commented 2 years ago

Thanks for the details. This now makes complete sense.

TopicNameHashBrokerSetMappingPolicy maps topics by their name to a broker set. For a moment, let's ignore the add or remove broker part. As soon as you updated the brokerSets.json and restarted cruise control, the BrokerSetAwareGoal would generate proposals. You can check this by calling the GET proposals API. BrokerSetAwareGoal will use default implementation of MappingPolicy to map topic to brokerSet. The default implementation here is TopicNameHashBrokerSetMappingPolicy. The idea behind the default implementation is to let the users split their clusters into brokerSets so that each topic is within one brokerSet's boundary. You could have your own implementation of mapping topics to a brokerSetId and BrokerSetAwareGoal will ensure that.

Irrespective of add or remove brokers, BrokerSetAwareGoal will generate proposals to keep a certain topic into a brokerSet. Also, there is consistent hashing used to make sure that when a brokerSet is added, the movements are limited. However, keep in mind that hashing by topic names may not be the best strategy for each use case.

Linjianfengccc commented 2 years ago

That makes sense, thank you guys so much @mohitpali @CCisGG Now I understand what CC does, and seems BrokerSetAwareGoal will be ignored when doing cluster rebalance, to set up the initial state of broker sets. I will try to implement my ReplicaToBrokerSetMappingPolicy. Thanks again!