linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

fix_offline_replicas fails to move replicas off dead broker #2187

Closed HonestTelevision closed 1 month ago

HonestTelevision commented 1 month ago

I have a dead broker with offline replicas that I'd like to reassign to live brokers. The wiki says fix_offline_replicas is supposed to move all the offline replicas from dead disks/brokers. However, when calling that endpoint, I get this error:

ERROR com.linkedin.kafka.cruisecontrol.executor.Executor - Executor got exception during execution java.util.concurrent.TimeoutException: null at java.util.concurrent.CompletableFuture.timedGet(Unknown Source) ~[?:?] at java.util.concurrent.CompletableFuture.get(Unknown Source) ~[?:?] at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180) ~[kafka-clients-3.1.0.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ReplicationThrottleHelper.getEntityConfigs(ReplicationThrottleHelper.java:203) ~[cruise-control.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ReplicationThrottleHelper.getBrokerConfigs(ReplicationThrottleHelper.java:198) ~[cruise-control.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ReplicationThrottleHelper.setThrottledRateIfUnset(ReplicationThrottleHelper.java:169) ~[cruise-control.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ReplicationThrottleHelper.setThrottles(ReplicationThrottleHelper.java:68) ~[cruise-control.jar:?] at com.linkedin.kafka.cruisecontrol.executor.Executor$ProposalExecutionRunnable.interBrokerMoveReplicas(Executor.java:1345) ~[cruise-control.jar:?] at com.linkedin.kafka.cruisecontrol.executor.Executor$ProposalExecutionRunnable.execute(Executor.java:1177) ~[cruise-control.jar:?] at com.linkedin.kafka.cruisecontrol.executor.Executor$ProposalExecutionRunnable.run(Executor.java:1103) ~[cruise-control.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?] at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?] at java.lang.Thread.run(Unknown Source) ~[?:?]

Seems like cruise control is trying to read the dead broker's configs. How can I move offline replicas off a dead broker with cruise control?

HonestTelevision commented 1 month ago

Never mind, seems I had this problem cuz my cruise control version was old. Pulled a newer image and resolved it. I saw that this issue had been addressed here: https://github.com/linkedin/cruise-control/pull/1955