Disk Usage with Prometheus Metric Reportic is not showing accurate disk usage

linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.

BSD 2-Clause "Simplified" License

2.74k stars 587 forks source link

Would you have any screenshots and more information on this ? Please share your configuration as well. You may have some partition information missing from Open Monitoring.

Cruise Control uses kafka_log_Log_Value metric for each partition and then sums up all partition size to get the Broker level Disk information.

https://github.com/linkedin/cruise-control/blob/b4e44ec004e6f5e22bd1c4e203d92341ed9e1659/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/monitor/sampling/prometheus/DefaultPrometheusQuerySupplier.java#L193

Please check the capacity defined in the capcityCores.json as well to see if the bytes are accurately added into the capacity.
Disk information in /load api is updated in every run of Sampler. But sampling is paused (saving samples on Kafka Topic) when an Execution is going on. You may see a mismatch during an execution because the disk information will not be updated.

@efeg @CCisGG Please keep me honest here.

linkedin / cruise-control

Disk Usage with Prometheus Metric Reportic is not showing accurate disk usage #1964