linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

PreferredLeaderElectionGoal cannot be used as a self.healing goal #1326

Closed amuraru closed 4 years ago

amuraru commented 4 years ago

I am trying to use PreferredLeaderElectionGoal as a self.healing goal (not included as anomaly.detection goal it seems that it cannot be used in the detector)

 anomaly.detection.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal
 self.healing.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,\
                       com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal

Trying to use this, though I get the same error PreferredLeaderElectionGoal goal does not support use by goal violation detector. Is this expected? Doing a cluster rebalance with this goal seems to work though:

com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: java.lang.IllegalArgumentException: PreferredLeaderElectionGoal goal does not support use by goal violation detector.
    at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:160)
    at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.RebalanceRunnable.workWithoutClusterModel(RebalanceRunnable.java:113)
    at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:166)
    at com.linkedin.kafka.cruisecontrol.detector.GoalViolations.fix(GoalViolations.java:73)
    at com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector$AnomalyHandlerTask.fixAnomalyInProgress(AnomalyDetector.java:521)
    at com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector$AnomalyHandlerTask.processAnomalyInProgress(AnomalyDetector.java:393)
    at com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector$AnomalyHandlerTask.handleAnomalyInProgress(AnomalyDetector.java:376)
    at com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector$AnomalyHandlerTask.run(AnomalyDetector.java:338)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.IllegalArgumentException: PreferredLeaderElectionGoal goal does not support use by goal violation detector.
    at com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal.sanityCheckOptimizationOptions(PreferredLeaderElectionGoal.java:57)
    at com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal.optimize(PreferredLeaderElectionGoal.java:82)
    at com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer.optimizations(GoalOptimizer.java:436)
    at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.optimizations(KafkaCruiseControl.java:551)
    at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.ProposalsRunnable.workWithClusterModel(ProposalsRunnable.java:112)
    at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:154)
    ... 13 more
efeg commented 4 years ago

Hi @amuraru This is indeed intentional -- i.e.PreferredLeaderElectionGoal is not intended to be used as part of the goal violation detection and this error message is expected. It is used for triggering an on-demand PLE operation to move the leadership within partitions to their most preferred replicas in their replica list.

Hope it helps!

amuraru commented 4 years ago

Right - I get that it's not meant be used as a detection goal, what I tried though is to use it as a self.healing goal only. Is that any different than running it theough manual PLE? (Fwiw running manually a rebalance using this goal works so was wondering can it be used in self.healing similarlt)

amuraru commented 4 years ago

@efeg I see now that the PLE on-demand request is actually using the rebalance API with PLEGoal. But still have this question: would it override any other goal decision if used as a last priority goal in self.healing.goals list?

In my environment I am seeing quite often partions served by non preferred leaders and wondering if we can automate the PLE via self.healing mechanims and always run PLEGoal whenever the goal violation detector detects imbalances via other goals.

Thanks

efeg commented 4 years ago

In my environment I am seeing quite often partions served by non preferred leaders

Why is this is a problem? As long as the self healing goals are satisfied, would it matter whether the leader is currently in a preferred replica or not?

would it override any other goal decision if used as a last priority goal in self.healing.goals list?

Current implementation does not support running PreferredLeaderElectionGoal alongside the other goals. It is intended to be used either by itself for a PLE operation or (internally) by the demote operation. Hypothetically, running it as the last self-healing goal would have overridden other goal decisions.

amuraru commented 4 years ago

makes sense @efeg - closing this issue now