linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 585 forks source link

CC reporting less disk for partition load #2155

Open rmb938 opened 4 months ago

rmb938 commented 4 months ago

When looking at the output from kafka-log-dirs and comparing it to cruise control's partition load rest api, it seems like cruise control is showing a smaller disk amount.

This leads to the broker load showing less disk then it should, and the cluster not balancing disk correctly when disk is set as a goal.

Looking into this further it seems like CC is only reporting the partition disk size from the leader, it doesn't also use the partition disk sizes from the followers.

Most of the time the leaders and followers will have pretty close partition sizes so this issue doesn't matter as much. However taking into account that each Kafka broker runs it's log cleaner independently the sizes between every partition replica could be different.

In extremely large Kafka clusters that have hundreds of terabytes of data and billions of messages per topic this difference does add up and having cruise control be unaware of this when determining broker load does leave the cluster unbalanced.

In the worst possible case I have seen, it is around a 1-2TB difference between what kafka-log-dirs says and what CC reports as broker disk usage. But I've seen a difference from a few megabytes to around 100-200GB. This is relatively small compared to the overall cluster size, but without cruise control knowing about this the brokers do end up being unbalanced over time.

mhratson commented 1 month ago

@rmb938 thanks for report, do you mind sharing more details/evidence?

rmb938 commented 1 month ago

Yup I can provide some more details and evidence. Give me a day or so to recollect the data. Unfortunately I did not save my initial findings.

On Tue, Aug 27, 2024, 10:03 PM Maryan Hratson @.***> wrote:

@rmb938 https://github.com/rmb938 thanks for report, do you mind sharing more details/evidence?

— Reply to this email directly, view it on GitHub https://github.com/linkedin/cruise-control/issues/2155#issuecomment-2314106899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEE6IMVHRC3UQPX5S6ZDFLZTU4XHAVCNFSM6AAAAABIH5HIUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJUGEYDMOBZHE . You are receiving this because you were mentioned.Message ID: @.***>

davidfarlow43 commented 3 weeks ago

I believe that I am seeing evidence of this in our kafka cluster. We use msk so not sure if that matters at all. What we see is that the disk usage in kafka reported by cruise control doesn't match what is reality in the cluster.

All our brokers have a 16tb disk, according to cruise control broker 5 is taking up the most disk at 8.55tb which is ~53% broker_load_cruise_control

But we can see looking at the aws cloud watch metrics that this is not accurate: cloudwatch_metrics_disk_usage broker 5 only shows 61% disk used. And the top broker by disk usage according to cloud watch is broker 9 at 67%.

Looking at prometheus metric kafka_log_Log_Value with the sum of all topic partition grouped by broker, we see that this matches what cloudwatch shows, broker 9 having the highest disk usage: prometheus_kafka_log_Log_Value_metrics

For some reason the cruise control is not reporting the right size, we do have compaction enabled on some fairly large topics so that lines up with what was previously reported. The affect of this inaccuracy is that I now have a pretty large disparity between brokers disk size because of this issue.