linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.76k stars 592 forks source link

Cruise Control is not collecting Metrics #243

Closed kcsathsih1 closed 6 years ago

kcsathsih1 commented 6 years ago

The CruiseControl service is coming up when we connect to kafka(Version:1.0.0) but not collecting metrics as shown below. Have enabled auto.leader.rebalance.enable to true. The SSL is enabled in Kafka as per our requirement. And created topic CruiseControlMetrics manually and other topics KafkaCruiseControlModelTrainingSamples and __KafkaCruiseControlPartitionMetricSamples got auto created. Please find the attachment of kafka/server.properties and cruisecontrol.properties. Let us know what could be the missing part.

[2018-06-04 07:00:13,759] INFO Kicking off sampling for time range [1528109893759, 1528110013759], duration 120000 ms using 1 fetchers with timeout 120000 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager) [2018-06-04 07:00:13,998] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [2018-06-04 07:00:18,771] INFO Finished sampling for time range [1528109893759,1528110013759]. Collected 0 metrics. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler) [2018-06-04 07:00:18,771] INFO Collected 0 partition metric samples for 0 partitions. Total partition assigned: 126. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher) [2018-06-04 07:00:18,771] INFO Collected 0 broker metric samples for 0 brokers. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher) [2018-06-04 07:00:18,771] INFO Finished sampling in 5011 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager) [2018-06-04 07:00:43,999] INFO Skipping best proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer)

KAFKA PROPERTIES

delete.topic.enable=true
ssl.keystore.location=./ssl/kafka.jks
num.replica.fetchers=8
metrics.num.samples=2
log.retention.hours=336
metrics.reporting.interval.ms=300000
ssl.truststore.password=trustpassword
auto.create.topics.enable=true
metrics.sample.window.ms=10
ssl.keystore.password=jkspassword
log.retention.check.interval.ms=300000
socket.request.max.bytes=104857600
zookeeper.connect=server1\:2181,server2\:2181,server3\:2181
num.partitions=5
min.insync.replicas=2
socket.receive.buffer.bytes=102400
ssl.truststore.type=JKS
unclean.leader.election.enable=false
socket.send.buffer.bytes=102400
message.max.bytes=10000012
security.inter.broker.protocol=SSL
auto.leader.rebalance.enable=true
replica.fetch.max.bytes=20048576
broker.id=2
log.cleaner.enable=true
num.network.threads=3
num.recovery.threads.per.data.dir=1
metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter
cruise.control.metrics.reporter.security.protocol=SSL
cruise.control.metrics.reporter.ssl.truststore.location=/opt/kafka/ssl/tls-kafka-9899100.jks
cruise.control.metrics.reporter.ssl.truststore.password=changeit
cruise.control.metrics.reporter.bootstrap.servers=0.0.0.0:9090
server.properties:cruise.control.metrics.reporter.bootstrap.servers=0.0.0.0:9090
log.dirs=./data/kafka
offsets.topic.replication.factor=3
listeners=SSL\://server1\:9090
ssl.endpoint.identification.algorithm=HTTPS
ssl.keystore.type=JKS
host.name=server1
log.flush.interval.ms=4000
ssl.client.auth=required
num.io.threads=8
log.segment.bytes=1073741824
ssl.truststore.location=./ssl/ca-cert.jks
zookeeper.connection.timeout.ms=3000
ssl.key.password=keypassword
metrics.recording.level=DEBUG

CRUISE CONTROL PROP

bootstrap.servers=server1:9090,server2:9090,server3:9090
num.metric.fetchers=1
metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler
metric.reporter.topic.pattern=__CruiseControlMetrics
sample.store.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore
partition.metric.sample.store.topic=__KafkaCruiseControlPartitionMetricSamples
broker.metric.sample.store.topic=__KafkaCruiseControlModelTrainingSamples
num.sample.loading.threads=8
metric.sampler.partition.assignor.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.DefaultMetricSamplerPartitionAssignor
metric.sampling.interval.ms=120000
partition.metrics.window.ms=300000
num.partition.metrics.windows=1
min.samples.per.partition.metrics.window=1
broker.metrics.window.ms=30000
num.broker.metrics.windows=20
min.samples.per.broker.metrics.window=1
capacity.config.file=config/capacity.json
default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal
goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerDiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerEvenRackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal
min.monitored.partition.percentage=0.95
cpu.balance.threshold=1.1
disk.balance.threshold=1.1
network.inbound.balance.threshold=1.1
network.outbound.balance.threshold=1.1
replica.count.balance.threshold=1.1
cpu.capacity.threshold=0.8
disk.capacity.threshold=0.8
network.inbound.capacity.threshold=0.8
network.outbound.capacity.threshold=0.8
cpu.low.utilization.threshold=0.0
disk.low.utilization.threshold=0.0
network.inbound.low.utilization.threshold=0.0
network.outbound.low.utilization.threshold=0.0
metric.anomaly.percentile.upper.threshold=90.0
metric.anomaly.percentile.lower.threshold=10.0
max.proposal.candidates=10
proposal.expiration.ms=60000
max.replicas.per.broker=10000
num.proposal.precompute.threads=1
zookeeper.connect=server1\:2181,server2\:2181,server3\:2181
num.concurrent.partition.movements.per.broker=10
execution.progress.check.interval.ms=10000
anomaly.notifier.class=com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier
metric.anomaly.finder.class=com.linkedin.kafka.cruisecontrol.detector.KafkaMetricAnomalyFinder
anomaly.detection.interval.ms=10000
anomaly.detection.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
metric.anomaly.analyzer.metrics=BROKER_PRODUCE_LOCAL_TIME_MS_MAX,BROKER_PRODUCE_LOCAL_TIME_MS_MEAN,BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MAX,BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MEAN,BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MAX,BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MEAN,BROKER_LOG_FLUSH_TIME_MS_MAX,BROKER_LOG_FLUSH_TIME_MS_MEAN
failed.brokers.zk.path=/CruiseControlBrokerList
self.healing.enabled=false
security.protocol=SSL
ssl.keystore.location=./ssl/kafka.jks
ssl.keystore.password=keystorepassword
ssl.key.password=keypassword
ssl.truststore.location=./ssl/ca-cert.jks
ssl.truststore.password=truststorepassword
ssl.truststore.type=JKS
ssl.keystore.type=JKS
jlisam commented 6 years ago
cruise.control.metrics.reporter.bootstrap.servers=0.0.0.0:9090
server.properties:cruise.control.metrics.reporter.bootstrap.servers=0.0.0.0:9090

I don't think this will fix your issue but the bottom entry is not correct.

kcsathsih1 commented 6 years ago

The issue got fixed after adding keystore as shown below.

metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter cruise.control.metrics.reporter.security.protocol=SSL cruise.control.metrics.reporter.ssl.truststore.location=/kafka/ssl/ca-cert.jks cruise.control.metrics.reporter.ssl.truststore.password=changeit cruise.control.metrics.reporter.bootstrap.servers=server1:9090,server2:9090,server3:9090 cruise.control.metrics.reporter.ssl.keystore.location=//kafka/ssl/ssl/kafka.jks cruise.control.metrics.reporter.ssl.keystore.password=pwd cruise.control.metrics.reporter.ssl.key.password=pwd

cruise.control.metrics.reporter.bootstrap.servers=0.0.0.0:9090

server.properties:cruise.control.metrics.reporter.bootstrap.servers=0.0.0.0:9090