linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.73k stars 585 forks source link

Configurable prometheus queries #1540

Open gustavomonarin opened 3 years ago

gustavomonarin commented 3 years ago

Kafka JMX metrics exported to Prometheus can have a very diverse representation and different calculation approaches (i.e.: promql avg vs jmx pre-calculated mean)

1366 did a great job introducing the prometheus client however it is not flexible enough to make cruise control adoption less intrusive mainly for already running production clusters, as a common representation / pattern of the jmx metrics follows the example of the jmxexporter project which does not match the provided default queries.

1366 also adds a nicely extension for providing a query supplier class, however the cruise configuration is not forward what makes complicated to access / add extra properties (i.e: properties file containing all the queries)

I would like to propose a new prometheus query supplier where the queries are extenalised to a properties configuration file.

I have made a simple test, by using the RawMetricType as key and the jmx_exporter example queries as values together with a small change to make the query supplier CruiseControlConfigurable and it works nicely.

Please let me know if you have any concerns/suggestions. If this solution brings value i would be more than happy to polish the code and create a PR in the following days.

rmb938 commented 3 years ago

@gustavomonarin This would be really useful for me when using other metrics that aren't node exporter and jmx exporter. For example being able to use container metrics from kubernetes for cpu usage and topic/partition metrics from different exporters like kminion.

HimanshuKhatarkar commented 9 months ago

Hi @gustavomonarin is this issue Still Open? Please assign this to me so that I can Help you with this.

gustavomonarin commented 9 months ago

Hi @HimanshuKhatarkar , I don't have permissions to assign the issue to you, however i believe it is just a question of raising the MR.

Unfortunately, when i was working on this most of the implementation was done as part of my work time as a contractor in a company and the company was not really easy to convince to share the code even though a chunck of it was on my free time, so i put as much details as i could in the description. Hope it is still up to date, but should be straight forward.