krallistic / kafka-operator

A Kafka Operator for Kubernetes
Apache License 2.0
297 stars 38 forks source link

Integrate Cruise Control #35

Open krallistic opened 7 years ago

krallistic commented 7 years ago

Linkedin opensourced cruise-control (https://github.com/linkedin/cruise-control) a tool to rebalance kafka-cluster. Since they have much more experience running kafka cluster, their algorithms should be used.

As a first integration, the following steps has to be done

Automatic Rebalancing of topics (if a skew occurs) is out of scope for the first integration

ankon commented 7 years ago

Right now this seems to fail with confluentinc/cp-kafka:3.3.0:

[2017-10-26 15:22:28,039] INFO KafkaConfig values: 
    advertised.host.name = null
    advertised.listeners = PLAINTEXT://kafka-0.kafka.development.svc.cluster.local:9092
    advertised.port = null
    alter.config.policy.class.name = null
    authorizer.class.name = 
    auto.create.topics.enable = true
    auto.leader.rebalance.enable = true
    background.threads = 10
    broker.id = 0
    broker.id.generation.enable = true
    broker.rack = null
    compression.type = producer
    connections.max.idle.ms = 600000
    controlled.shutdown.enable = true
    controlled.shutdown.max.retries = 3
    controlled.shutdown.retry.backoff.ms = 5000
    controller.socket.timeout.ms = 30000
    create.topic.policy.class.name = null
    default.replication.factor = 1
    delete.records.purgatory.purge.interval.requests = 1
    delete.topic.enable = false
    fetch.purgatory.purge.interval.requests = 1000
    group.initial.rebalance.delay.ms = 3000
    group.max.session.timeout.ms = 300000
    group.min.session.timeout.ms = 6000
    host.name = 
    inter.broker.listener.name = null
    inter.broker.protocol.version = 0.11.0-IV2
    leader.imbalance.check.interval.seconds = 300
    leader.imbalance.per.broker.percentage = 10
    listener.security.protocol.map = SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,TRACE:TRACE,SASL_SSL:SASL_SSL,PLAINTEXT:PLAINTEXT
    listeners = PLAINTEXT://0.0.0.0:9092
    log.cleaner.backoff.ms = 15000
    log.cleaner.dedupe.buffer.size = 134217728
    log.cleaner.delete.retention.ms = 86400000
    log.cleaner.enable = true
    log.cleaner.io.buffer.load.factor = 0.9
    log.cleaner.io.buffer.size = 524288
    log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
    log.cleaner.min.cleanable.ratio = 0.5
    log.cleaner.min.compaction.lag.ms = 0
    log.cleaner.threads = 1
    log.cleanup.policy = [delete]
    log.dir = /tmp/kafka-logs
    log.dirs = /var/lib/kafka/data
    log.flush.interval.messages = 9223372036854775807
    log.flush.interval.ms = null
    log.flush.offset.checkpoint.interval.ms = 60000
    log.flush.scheduler.interval.ms = 9223372036854775807
    log.flush.start.offset.checkpoint.interval.ms = 60000
    log.index.interval.bytes = 4096
    log.index.size.max.bytes = 10485760
    log.message.format.version = 0.11.0-IV2
    log.message.timestamp.difference.max.ms = 9223372036854775807
    log.message.timestamp.type = CreateTime
    log.preallocate = false
    log.retention.bytes = -1
    log.retention.check.interval.ms = 300000
    log.retention.hours = 168
    log.retention.minutes = null
    log.retention.ms = null
    log.roll.hours = 168
    log.roll.jitter.hours = 0
    log.roll.jitter.ms = null
    log.roll.ms = null
    log.segment.bytes = 1073741824
    log.segment.delete.delay.ms = 60000
    max.connections.per.ip = 2147483647
    max.connections.per.ip.overrides = 
    message.max.bytes = 1000012
    metric.reporters = [com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter]
    metrics.num.samples = 2
    metrics.recording.level = INFO
    metrics.sample.window.ms = 30000
    min.insync.replicas = 1
    num.io.threads = 8
    num.network.threads = 3
    num.partitions = 1
    num.recovery.threads.per.data.dir = 1
    num.replica.fetchers = 1
    offset.metadata.max.bytes = 4096
    offsets.commit.required.acks = -1
    offsets.commit.timeout.ms = 5000
    offsets.load.buffer.size = 5242880
    offsets.retention.check.interval.ms = 600000
    offsets.retention.minutes = 1440
    offsets.topic.compression.codec = 0
    offsets.topic.num.partitions = 50
    offsets.topic.replication.factor = 3
    offsets.topic.segment.bytes = 104857600
    port = 9092
    principal.builder.class = class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
    producer.purgatory.purge.interval.requests = 1000
    queued.max.requests = 500
    quota.consumer.default = 9223372036854775807
    quota.producer.default = 9223372036854775807
    quota.window.num = 11
    quota.window.size.seconds = 1
    replica.fetch.backoff.ms = 1000
    replica.fetch.max.bytes = 1048576
    replica.fetch.min.bytes = 1
    replica.fetch.response.max.bytes = 10485760
    replica.fetch.wait.max.ms = 500
    replica.high.watermark.checkpoint.interval.ms = 5000
    replica.lag.time.max.ms = 10000
    replica.socket.receive.buffer.bytes = 65536
    replica.socket.timeout.ms = 30000
    replication.quota.window.num = 11
    replication.quota.window.size.seconds = 1
    request.timeout.ms = 30000
    reserved.broker.max.id = 1000
    sasl.enabled.mechanisms = [GSSAPI]
    sasl.kerberos.kinit.cmd = /usr/bin/kinit
    sasl.kerberos.min.time.before.relogin = 60000
    sasl.kerberos.principal.to.local.rules = [DEFAULT]
    sasl.kerberos.service.name = null
    sasl.kerberos.ticket.renew.jitter = 0.05
    sasl.kerberos.ticket.renew.window.factor = 0.8
    sasl.mechanism.inter.broker.protocol = GSSAPI
    security.inter.broker.protocol = PLAINTEXT
    socket.receive.buffer.bytes = 102400
    socket.request.max.bytes = 104857600
    socket.send.buffer.bytes = 102400
    ssl.cipher.suites = null
    ssl.client.auth = none
    ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
    ssl.endpoint.identification.algorithm = null
    ssl.key.password = null
    ssl.keymanager.algorithm = SunX509
    ssl.keystore.location = null
    ssl.keystore.password = null
    ssl.keystore.type = JKS
    ssl.protocol = TLS
    ssl.provider = null
    ssl.secure.random.implementation = null
    ssl.trustmanager.algorithm = PKIX
    ssl.truststore.location = null
    ssl.truststore.password = null
    ssl.truststore.type = JKS
    transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
    transaction.max.timeout.ms = 900000
    transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
    transaction.state.log.load.buffer.size = 5242880
    transaction.state.log.min.isr = 2
    transaction.state.log.num.partitions = 50
    transaction.state.log.replication.factor = 3
    transaction.state.log.segment.bytes = 104857600
    transactional.id.expiration.ms = 604800000
    unclean.leader.election.enable = false
    zookeeper.connect = zookeeper.development.svc.cluster.local
    zookeeper.connection.timeout.ms = null
    zookeeper.session.timeout.ms = 6000
    zookeeper.set.acl = false
    zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2017-10-26 15:22:28,105] WARN The support metrics collection feature ("Metrics") of Proactive Support is disabled. (io.confluent.support.metrics.SupportedServerStartable)
[2017-10-26 15:22:28,106] INFO starting (kafka.server.KafkaServer)
[2017-10-26 15:22:28,107] INFO Connecting to zookeeper on zookeeper.development.svc.cluster.local (kafka.server.KafkaServer)
[2017-10-26 15:22:28,119] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2017-10-26 15:22:28,123] INFO Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:host.name=kafka-0.kafka.development.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.vendor=Azul Systems, Inc. (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.home=/usr/lib/jvm/zulu-8-amd64/jre (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.class.path=:/usr/bin/../share/java/kafka/connect-file-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/kafka-streams-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/hk2-locator-2.5.0-b05.jar:/usr/bin/../share/java/kafka/kafka-streams-examples-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-test.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-javadoc.jar:/usr/bin/../share/java/kafka/connect-api-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/argparse4j-0.7.0.jar:/usr/bin/../share/java/kafka/support-metrics-common-3.3.0.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-core-2.24.jar:/usr/bin/../share/java/kafka/commons-lang3-3.1.jar:/usr/bin/../share/java/kafka/guava-20.0.jar:/usr/bin/../share/java/kafka/zookeeper-3.4.10.jar:/usr/bin/../share/java/kafka/kafka.jar:/usr/bin/../share/java/kafka/jetty-servlet-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/commons-validator-1.4.1.jar:/usr/bin/../share/java/kafka/commons-codec-1.9.jar:/usr/bin/../share/java/kafka/httpclient-4.5.2.jar:/usr/bin/../share/java/kafka/scala-parser-combinators_2.11-1.0.4.jar:/usr/bin/../share/java/kafka/jackson-databind-2.8.5.jar:/usr/bin/../share/java/kafka/jersey-guava-2.24.jar:/usr/bin/../share/java/kafka/commons-lang3-3.5.jar:/usr/bin/../share/java/kafka/commons-compress-1.8.1.jar:/usr/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka-log4j-appender-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/jersey-common-2.24.jar:/usr/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/jetty-servlets-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/hk2-utils-2.5.0-b05.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-base-2.8.5.jar:/usr/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.8.5.jar:/usr/bin/../share/java/kafka/jackson-annotations-2.8.5.jar:/usr/bin/../share/java/kafka/aopalliance-repackaged-2.5.0-b05.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-test-sources.jar:/usr/bin/../share/java/kafka/connect-json-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/usr/bin/../share/java/kafka/rocksdbjni-5.0.1.jar:/usr/bin/../share/java/kafka/commons-digester-1.8.1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-scaladoc.jar:/usr/bin/../share/java/kafka/metrics-core-2.2.0.jar:/usr/bin/../share/java/kafka/jetty-server-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/log4j-1.2.17.jar:/usr/bin/../share/java/kafka/jersey-media-jaxb-2.24.jar:/usr/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/usr/bin/../share/java/kafka/javax.inject-1.jar:/usr/bin/../share/java/kafka/lz4-1.3.0.jar:/usr/bin/../share/java/kafka/hk2-api-2.5.0-b05.jar:/usr/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/usr/bin/../share/java/kafka/jopt-simple-5.0.3.jar:/usr/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/usr/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/usr/bin/../share/java/kafka/plexus-utils-3.0.24.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.8.5.jar:/usr/bin/../share/java/kafka/maven-artifact-3.5.0.jar:/usr/bin/../share/java/kafka/validation-api-1.1.0.Final.jar:/usr/bin/../share/java/kafka/connect-transforms-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-sources.jar:/usr/bin/../share/java/kafka/commons-logging-1.2.jar:/usr/bin/../share/java/kafka/commons-collections-3.2.1.jar:/usr/bin/../share/java/kafka/javax.inject-2.5.0-b05.jar:/usr/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka-clients-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/paranamer-2.7.jar:/usr/bin/../share/java/kafka/jetty-security-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/usr/bin/../share/java/kafka/zkclient-0.10.jar:/usr/bin/../share/java/kafka/javassist-3.21.0-GA.jar:/usr/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/usr/bin/../share/java/kafka/httpmime-4.5.2.jar:/usr/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/connect-runtime-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/slf4j-log4j12-1.7.25.jar:/usr/bin/../share/java/kafka/slf4j-api-1.7.25.jar:/usr/bin/../share/java/kafka/scala-library-2.11.11.jar:/usr/bin/../share/java/kafka/kafka-tools-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/avro-1.8.2.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-2.24.jar:/usr/bin/../share/java/kafka/xz-1.5.jar:/usr/bin/../share/java/kafka/javax.ws.rs-api-2.0.1.jar:/usr/bin/../share/java/kafka/support-metrics-client-3.3.0.jar:/usr/bin/../share/java/kafka/jackson-core-2.8.5.jar:/usr/bin/../share/java/kafka/jersey-server-2.24.jar:/usr/bin/../share/java/kafka/reflections-0.9.11.jar:/usr/bin/../share/java/kafka/httpcore-4.4.4.jar:/usr/bin/../share/java/kafka/jersey-client-2.24.jar:/usr/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:os.version=4.7.2 (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,125] INFO Initiating client connection, connectString=zookeeper.development.svc.cluster.local sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@3427b02d (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,137] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2017-10-26 15:22:28,143] INFO Opening socket connection to server 172.17.0.6/172.17.0.6:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,206] INFO Socket connection established to 172.17.0.6/172.17.0.6:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,215] INFO Session establishment complete on server 172.17.0.6/172.17.0.6:2181, sessionid = 0x15f584510a5000f, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,217] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2017-10-26 15:22:28,318] INFO Cluster ID = nWDeBixwReaENQBQy5o5oA (kafka.server.KafkaServer)
[2017-10-26 15:22:28,327] WARN No meta.properties file under dir /var/lib/kafka/data/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2017-10-26 15:22:28,339] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter ClassNotFoundException exception occurred
    at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:288)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
    at io.confluent.support.metrics.SupportedServerStartable.startup(SupportedServerStartable.java:102)
    at io.confluent.support.metrics.SupportedKafka.main(SupportedKafka.java:49)
Caused by: java.lang.ClassNotFoundException: com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.kafka.common.utils.Utils.newInstance(Utils.java:300)
    at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:286)
    ... 3 more
[2017-10-26 15:22:28,343] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
[2017-10-26 15:22:28,348] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2017-10-26 15:22:28,350] INFO Session: 0x15f584510a5000f closed (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,352] INFO EventThread shut down for session: 0x15f584510a5000f (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,352] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer)
[2017-10-26 15:22:28,368] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
krallistic commented 7 years ago

HEAD is currently pretty unstable through that refactoring. (With cruise-control unfortunately there will be custom images, based on the confluence one). A stable Version is the latest tagged version (0.2.0)