linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.68k stars 574 forks source link

OptimizationFailureException for DiskCapacityGoal wrong computation of disk usage #2166

Open IgorBerman opened 2 weeks ago

IgorBerman commented 2 weeks ago

I have a question how it's possible to investigate why OptimizationFailureException happens. I believe there is something wrong (maybe due to some missing setting or config) with disk current utilization we are getting following error while trying to get proposals:

errorMessage: "Error processing GET request '/proposals' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [DiskCapacityGoal] Violated capacity limit of 17579716.800000 via broker utilization of 22606978.000000 with broker 201 for resource disk. '.",
stackTrace: "java.util.concurrent.ExecutionException: com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [DiskCapacityGoal] Violated capacity limit of 17579716.800000 via broker utilization of 22606978.000000 with broker 201 for resource disk

capacity limit is computed right as 0.8 * capacity configured with capacity.json, however current usage is computed wrongly, when looking at load endpoint for this broker we get 13709573.000bytes (62.39%) and not 22606978.000000. I can't figure out where this 22606978.000000 comes from Screenshot 2024-06-19 at 14 53 11

we are using rather old verison of cruise control '2.0.100'

Any ideas suggestions will be highly appreciated Thanks in advance Igor