Closed rosakng closed 10 months ago
A proposed solution when clearing throttles is to remove the throttle rate configuration if the current throttle rate is of a dynamic broker config source type, such that only dynamic broker configurations are being removed. In order to support static configurations being returned after a previous dynamic configuration is removed in waitForConfigs is to set the expected configuration value to the current static configuration value, only if the current configuration is set static-ly.
👋 @CCisGG , would you happen to know if this is a known issue or something I can contribute for a fix?
Hi @rosakng , that's a good finding. Thanks for taking investigation on this!
From my understand, when there is any static config for throttling on Kafka side, CC does not have a way to remove the throttling. In that case, if user ask CC to remove throttling during rebalance, wouldn't it make more sense for CC to set the dynamic config value to 0? In that way, the value 0 should override the static config and the throttling should still work. The way you are proposal has a potential issue: when user set throttling to 0 for rebalance, they will assume CC rebalances with no throttling, while the rebalance is actually throttled by broker side config. Does that make sense?
HI @CCisGG Thanks for your reply!
My concern with setting default.replication.throttle
to 0 is that based off of the config specifications
for follower and leader throttled rates, the upper bound on inbound/outbound replication traffic would be 0 bytes/second, causing there to be no partition movements.
I tried testing this out and saw that no movements were being made:
In addition, CC would not use the default.replication.throttle
config if an existing static/dynamic broker configuration is already in place because of the setIfUnset logic.
I propose that if a user asks CC to rebalance using default.replication.throttle
config, the value should overwrite any static/dynamic broker config that may already exist on the Kafka side. What do you think?
My concern with setting default.replication.throttle to 0 is that based off of the config specifications for follower and leader throttled rates, the upper bound on inbound/outbound replication traffic would be 0 bytes/second, causing there to be no partition movements.
@rosakng Just realize it, and you are absolutely right. I think override the config makes more senses. Do we need to remove the override after the rebalance? I'm not super sure about it but might be something to consider. Please feel free to work on that and I'll review it as needed.
Today the way Cruise Control removes broker throttle rate configurations during a rebalance is to delete the current configurations for leader and follower replication throttled rate, then expect those configurations to be
null
after removing them and before continuing on with the rebalance. Though this is true in the case of removing dynamic broker configurations, however, if there is a static broker configuration set forleader.replication.throttled.rate
orfollower.replication.throttled.rate
, the configurations will never equal in the check, leaving the rebalance to hang and exhaust all of its retries without continuing.The state of cruise control then appears to have paused in either moving more data or fully completing the process. Cruise control should be able to run rebalances even if a static configuration is set for
leader.replication.throttled.rate
orfollower.replication.throttled.rate
along withdefault.replication.throttle
.