linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.75k stars 588 forks source link

Cruise control without zookeeper crashes when `zookeeper.connect` config is absent #1823

Open kanapuli opened 2 years ago

kanapuli commented 2 years ago

Description

I am trying to run a cruise control pod in kubernetes connecting to an AWS MSK kafka cluster. The zookeeper access in my case is sealed for security reasons. I have followed the wiki article https://github.com/linkedin/cruise-control/wiki/Run-without-ZooKeeper to run the cruise control without zookeeper. However the cruise control pod crashes with the following error complaining about the missing config zookeeper.connect which is against what is described in the wiki page,

14:35:03.711 [main] ERROR com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain - Uncaught exception on thread Thread[main,5,main]
org.apache.kafka.common.config.ConfigException: Missing required configuration "zookeeper.connect" which has no default value.
    at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:493) ~[kafka-clients-3.1.0.jar:?]
    at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:483) ~[kafka-clients-3.1.0.jar:?]
    at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:113) ~[kafka-clients-3.1.0.jar:?]
    at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:146) ~[kafka-clients-3.1.0.jar:?]
    at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.<init>(KafkaCruiseControlConfig.java:51) ~[cruise-control.jar:?]
    at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.<init>(KafkaCruiseControlConfig.java:47) ~[cruise-control.jar:?]
    at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlUtils.readConfig(KafkaCruiseControlUtils.java:865) ~[cruise-control.jar:?]
    at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain.main(KafkaCruiseControlMain.java:35) ~[cruise-control.jar:?]

Other specifications

Kafka version: 2.6.2

Cruise-control properties file

kafka.broker.failure.detection.enable=true
bootstrap.servers=${env:KAFKA_BOOTSTRAP_SERVERS}
security.protocol=SSL
ssl.keystore.type=JKS
ssl.keystore.location=${env:KAFKA_SSL_KEYSTORE_LOCATION}
ssl.keystore.password=${env:KAFKA_SSL_KEYSTORE_PASSWORD}
num.metric.fetchers=1
metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler
prometheus.server.endpoint=${env:PROMETHEUS_ENDPOINT}
sampling.allow.cpu.capacity.estimation=true
metric.reporter.topic=__CruiseControlMetrics
sample.store.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore
partition.metric.sample.store.topic=__KafkaCruiseControlPartitionMetricSamples
broker.metric.sample.store.topic=__KafkaCruiseControlModelTrainingSamples
sample.store.topic.replication.factor=2
num.sample.loading.threads=8
metric.sampler.partition.assignor.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.DefaultMetricSamplerPartitionAssignor
metric.sampling.interval.ms=120000
partition.metrics.window.ms=300000
num.partition.metrics.windows=5
min.samples.per.partition.metrics.window=1
broker.metrics.window.ms=300000
num.broker.metrics.windows=20
min.samples.per.broker.metrics.window=1
capacity.config.file=config/capacityCores.json
default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal
goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerDiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerEvenRackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal
intra.broker.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskUsageDistributionGoal
hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
min.valid.partition.ratio=0.95
cpu.balance.threshold=1.1
disk.balance.threshold=1.1
network.inbound.balance.threshold=1.1
network.outbound.balance.threshold=1.1
replica.count.balance.threshold=1.1
cpu.capacity.threshold=0.7
disk.capacity.threshold=0.8
network.inbound.capacity.threshold=0.8
network.outbound.capacity.threshold=0.8
cpu.low.utilization.threshold=0.0
disk.low.utilization.threshold=0.0
network.inbound.low.utilization.threshold=0.0
network.outbound.low.utilization.threshold=0.0
metric.anomaly.percentile.upper.threshold=90.0
metric.anomaly.percentile.lower.threshold=10.0
proposal.expiration.ms=60000
max.replicas.per.broker=10000
num.proposal.precompute.threads=1
num.concurrent.partition.movements.per.broker=10
num.concurrent.intra.broker.partition.movements=2
num.concurrent.leader.movements=1000
execution.progress.check.interval.ms=10000
anomaly.notifier.class=com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier
metric.anomaly.finder.class=com.linkedin.kafka.cruisecontrol.detector.KafkaMetricAnomalyFinder
anomaly.detection.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
metric.anomaly.analyzer.metrics=BROKER_PRODUCE_LOCAL_TIME_MS_50TH,BROKER_PRODUCE_LOCAL_TIME_MS_999TH,BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_50TH,BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_999TH,BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_50TH,BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_999TH,BROKER_LOG_FLUSH_TIME_MS_50TH,BROKER_LOG_FLUSH_TIME_MS_999TH
self.healing.exclude.recently.demoted.brokers=true
self.healing.exclude.recently.removed.brokers=true
failed.brokers.zk.path=/CruiseControlBrokerList
topic.config.provider.class=com.linkedin.kafka.cruisecontrol.config.KafkaAdminTopicConfigProvider
cluster.configs.file=config/clusterConfigs.json
completed.kafka.monitor.user.task.retention.time.ms=86400000
completed.cruise.control.monitor.user.task.retention.time.ms=86400000
completed.kafka.admin.user.task.retention.time.ms=604800000
completed.cruise.control.admin.user.task.retention.time.ms=604800000
completed.user.task.retention.time.ms=86400000
demotion.history.retention.time.ms=1209600000
removal.history.retention.time.ms=1209600000
max.cached.completed.kafka.monitor.user.tasks=20
max.cached.completed.cruise.control.monitor.user.tasks=20
max.cached.completed.kafka.admin.user.tasks=30
max.cached.completed.cruise.control.admin.user.tasks=30
max.cached.completed.user.tasks=25
max.active.user.tasks=5
self.healing.enabled=false
webserver.http.port=9090
webserver.http.address=0.0.0.0
webserver.http.cors.enabled=false
webserver.http.cors.origin=http://localhost:8080/
webserver.http.cors.allowmethods=OPTIONS,GET,POST
webserver.http.cors.exposeheaders=User-Task-ID
webserver.api.urlprefix=/kafkacruisecontrol/*
webserver.ui.diskpath=./cruise-control-ui/dist/
webserver.ui.urlprefix=/*
webserver.request.maxBlockTimeMs=10000
webserver.session.maxExpiryTimeMs=60000
webserver.session.path=/
webserver.accesslog.enabled=true
webserver.accesslog.path=access.log
webserver.accesslog.retention.days=14
two.step.verification.enabled=false
two.step.purgatory.retention.time.ms=1209600000
two.step.purgatory.max.requests=25

Any help will be much appreciated to resolve this issue.

efeg commented 2 years ago

@kanapuli Thanks for reporting. Can you verify that you are using a supported version of CC? The full support has been added in CC versions 2.5.89.