Open bobelev opened 1 year ago
I see the same behaviour with Cruise Control 2.5.134
Error message:
Error processing GET request '/load' due to: 'com.linkedin.kafka.cruisecontrol.exception.BrokerCapacityResolutionException: Unable to resolve capacity of broker 2
capacity.json
:
{
"brokerCapacities": [
{
"brokerId": "-1",
"capacity": {
"CPU": "100",
"DISK": {
"/data1": "1000000"
},
"NW_IN": "10000",
"NW_OUT": "10000"
}
}
]
}
brokerSets.json
{
"brokerSets": [
{
"brokerSetId": "brokerSet0",
"brokerIds": [0, 1]
}
]
}
Perhaps my broker set is misconfigured? I see a reference to broker 2
in the error message.
Since I haven't specified a custom config file resolver, I am assuming the default specified in the comments applies, namely BrokerCapacityConfigFileResolver
.
I changed brokerSets
to:
{
"brokerSets": [
{
"brokerSetId": "brokerSet0",
"brokerIds": [1, 2]
}
]
}
and the error message changed to:
Error processing GET request '/load' due to: 'com.linkedin.kafka.cruisecontrol.exception.BrokerCapacityResolutionException: Unable to resolve capacity of broker 1
At first glance I can't see the logic that would support reading the default values for a broker from broker id -1
:
https://github.com/linkedin/cruise-control/blob/migrate_to_kafka_2_5/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/config/BrokerCapacityConfigFileResolver.java#L181
What fixed it for me:
-1
default capacity configuration1
and 2
(the values I have in the brokerIds
list"/kafka/datalogs/logs"
as key in the DISK
configuration: for some reason CC requires this path.capacity.json
:
{
"brokerCapacities": [
{
"brokerId": "1",
"capacity": {
"CPU": "100",
"DISK": {
"/kafka/datalogs/logs": "1000000"
},
"NW_IN": "10000",
"NW_OUT": "10000"
}
},
{
"brokerId": "2",
"capacity": {
"CPU": "100",
"DISK": {
"/kafka/datalogs/logs": "1000000"
},
"NW_IN": "10000",
"NW_OUT": "10000"
}
}
]
}
Let's say we have identical brokers and capacityJSON has only default broker definition:
If you try to get disk_info you'll get an error
If you generate config for each broker in a cluster, endpoint works.
I think this might be related to an uneven disk capacity usage during cluster rebalancing (#1590). In my setup some of the disks are getting more than 95% used space. So proposal execution must be stopped in order to rebalance disks manually.