linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

Brokers not able to send metrics to CruiseControl #209

Closed jmarkan closed 6 years ago

jmarkan commented 6 years ago

Hello, Before I spell more details on the issue, I'd like to share the environment information:

Kafka Version: 0.11.0.2 (Confluent Platform offering) Cruise Control co-hosted on 1 Kafka broker in the cluster. Total number of Kafka brokers: 6 Total number of Zookeepers: 3 Cruise Control running on port: 9090

Issue details: Cruise Control side -> no errors. Here is the log snapshot - [2018-04-26 12:33:10,474] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [2018-04-26 12:33:40,474] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [2018-04-26 12:34:10,193] INFO Kicking off sampling for time range [1524745930193, 1524746050193], duration 120000 ms using 1 fetchers with timeout 120000 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager) [2018-04-26 12:34:10,474] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [2018-04-26 12:34:15,196] INFO Finished sampling for time range [1524745930193,1524746050193]. Collected 0 metrics. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler) [2018-04-26 12:34:15,196] INFO Finished sampling in 5003 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager) [2018-04-26 12:34:40,475] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [2018-04-26 12:35:10,475] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer)

Kafka brokers side ->There are warnings which say that the broker is unable to send cruise control metrics. Here is the log of that warning:

[2018-04-26 12:28:48,172] WARN Failed to send Cruise Control metric [PARTITION_METRIC,PARTITION_SIZE,time=1524745127781,brokerId=6115,partition=__consumer_offsets-0,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:29:48,315] WARN Failed to send Cruise Control metric [TOPIC_METRIC,TOPIC_FETCH_REQUEST_RATE,time=1524745127781,brokerId=6115,topic=__CruiseControlMetrics,value=0.173] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:30:48,315] WARN Failed to send Cruise Control metric [BROKER_METRIC,BROKER_RESPONSE_QUEUE_SIZE,time=1524745127781,brokerId=6115,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:31:48,396] WARN Failed to send Cruise Control metric [TOPIC_METRIC,TOPIC_BYTES_IN,time=1524745127781,brokerId=6115,topic=__CruiseControlMetrics,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:32:48,397] WARN Failed to send Cruise Control metric [BROKER_METRIC,ALL_TOPIC_REPLICATION_BYTES_OUT,time=1524745127781,brokerId=6115,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:33:48,397] WARN Failed to send Cruise Control metric [TOPIC_METRIC,TOPIC_BYTES_OUT,time=1524745127781,brokerId=6115,topic=__KafkaCruiseControlModelTrainingSamples,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:34:48,398] WARN Failed to send Cruise Control metric [PARTITION_METRIC,PARTITION_SIZE,time=1524745127781,brokerId=6115,partition=__KafkaCruiseControlPartitionMetricSamples-1,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:35:48,406] WARN Failed to send Cruise Control metric [PARTITION_METRIC,PARTITION_SIZE,time=1524745127781,brokerId=6115,partition=__KafkaCruiseControlPartitionMetricSamples-0,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter) [2018-04-26 12:36:48,407] WARN Failed to send Cruise Control metric [PARTITION_METRIC,PARTITION_SIZE,time=1524745127781,brokerId=6115,partition=__KafkaCruiseControlPartitionMetricSamples-4,value=0.000] (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)

Here is the screenshot of the /state?verbose=true page: image

Any help on this issue will be much appreciated as we've been trying this solution for quite a few weeks now without success.

@efeg ^^

jmarkan commented 6 years ago

Hello, I was able to figure out the problem.

The config: cruise.control.metrics.reporter.bootstrap.servers on brokers was having incorrect port. As soon as I corrected the port and bounced the brokers, they started to send the metrics.

Closing this issue.