Open khodyrevyurii opened 2 years ago
Looks like a duplicate of this issue https://github.com/linkedin/cruise-control/issues/1799 (upgraded CC(2.5 branch) and dead-broker auto healing function error out)
It's probably because the timeout config is set to too low by default:
The timeout is 30 seconds which should be enough to get one broker configuration via the admin client. But in the case of a dead broker the client always reaches the timeout. It doesn't make much sense to apply throttling to a dead broker, right?
@kooli89 You made a very good point. I think there is a need for skip setting up the throttling config for dead broker.
The timeout is 30 seconds which should be enough to get one broker configuration via the admin client. But in the case of a dead broker the client always reaches the timeout. It doesn't make much sense to apply throttling to a dead broker, right?
Hi. Yes, that was exactly the problem. Sorry for the long reply, I was out of touch.
Hi.
We encountered an error with cruise control when using the parameter default.replication.throttle when we try to remove the dead broker from cluster
Error:
Environment:
Step to reproduce:
default.replication.throttle
in cruisecontrol.properties (in my exampledefault.replication.throttle=10000000
)/remove_broker?brokerid={{broker_id}}&dryrun=false
)Unfortunately, I have no development experience, so only guesses remain. But it seems to me that the problem occurs in this block https://github.com/linkedin/cruise-control/blob/migrate_to_kafka_2_5/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/executor/ReplicationThrottleHelper.java#L63
On our cluster, we temporarily solved the problem by rolling back the changes of this PR: https://github.com/linkedin/cruise-control/pull/1781