linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

Cruise control Exceptions : Cannot get proposal because model completeness is not met. #145

Closed chandradeepak closed 6 years ago

chandradeepak commented 6 years ago

I am seeing these errors often. How can i get recover from these exceptions ?

[2018-02-28 00:51:05,656] ERROR Uncaught exception in anomaly handler. (com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector) java.lang.IllegalStateException: Cannot get proposal because model completeness is not met. at com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer.optimizations(GoalOptimizer.java:235) at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.getOptimizationProposals(KafkaCruiseControl.java:321) at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.getOptimizationProposals(KafkaCruiseControl.java:363) at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.rebalance(KafkaCruiseControl.java:198) at com.linkedin.kafka.cruisecontrol.detector.GoalViolations.fix(GoalViolations.java:38) at com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector$AnomalyHandlerTask.fixAnomaly(AnomalyDetector.java:199) at com.linkedin.kafka.cruisecontrol.detector.AnomalyDetector$AnomalyHandlerTask.run(AnomalyDetector.java:161) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.base/java.lang.Thread.run(Thread.java:844)

efeg commented 6 years ago

Model completeness indicates whether Cruise Control (CC) has "enough" data (more on what this means later) to make a decision about the current cluster state by generating proposals to improve the cluster state. AnomalyDetector periodically checks whether the goals specified in anomaly.detection.goals are satisfied -- i.e. not violating their requirements. In case the self.healing.enabled=true in config/cruisecontrol.properties, the AnomalyDetector attempts to automatically generate proposals and execute them to fix the unbalanced cluster state. But if the data collected so far is insufficient, this stack trace will be in the logs.

For each goal there is a corresponding model completeness requirement. This can be seen through state endpoint with verbose option -- e.g. for CC running in localhost you may get this via http://localhost:9090/kafkacruisecontrol/state?verbose=true.

Collected data is considered as enough for a goal if it satisfies its model completeness requirement:

  1. For proposals generated in DATA_FROM_PARAM=VALID_WINDOWS mode (default), (1) data for at least requiredNumWindows windows must be collected, and (2) each window must have monitored more than minMonitoredPartitionPercentage of partitions.
  2. For proposals generated in DATA_FROM_PARAM=VALID_PARTITIONS mode, data for all windows must be collected and at least minMonitoredPartitionPercentage of partitions in cluster must have been monitored during this period.

-- This should probably be logged as a warning rather than an unhandled exception though (see patch https://github.com/linkedin/cruise-control/pull/149)

chandradeepak commented 6 years ago

@efeg , thanks for fixing it .