linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

AnomalyDetector.GOAL_VIOLATION-has-unfixable-goals does not go to 1 when ReplicaCapacityGoal breaches. #1818

Closed mohitpali closed 2 years ago

mohitpali commented 2 years ago

In a scenario where the replica counts exceed on all brokers and the goals are unfixable, I would expect the metric AnomalyDetector.GOAL_VIOLATION-has-unfixable-goals to go to 1.

However, I noticed that the suggestion is to also remove excluded_topics, which is not an option for me. I would expect this to be reported as unfixable goals.

Error processing GET request '/proposals' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [ReplicaCapacityGoal] Replica count (513) in broker 8 exceeds the maximum allowed number of replicas per broker: 480. Add at least 1 broker. || Tips: [1] There are 5 topics excluded from replica move. Potential mitigation: Remove selected topics from exclusion using excluded_topics parameter. Then, re-run your original request. Add at least 1 broker.'.

efeg commented 2 years ago

@mohitpali The metric AnomalyDetector.GOAL_VIOLATION-has-unfixable-goals is updated whenever a GOAL_VIOLATION anomaly with unfixable goals is detected.

Based on the log you shared, you manually used the REST API of Cruise Control and got an OptimizationFailureException due to failure to satisfy ReplicaCapacityGoal. https://github.com/linkedin/cruise-control/blob/4e5927b48bf2581ab76acbbecbf42b355b871b65/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/servlet/KafkaCruiseControlServletUtils.java#L260

In this case, it is not expected for the AnomalyDetector.GOAL_VIOLATION-has-unfixable-goals metric to be updated. So what you observed is the correct behavior.