linkedin / kafka-monitor

Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
https://engineering.linkedin.com/blog/2016/05/open-sourcing-kafka-monitor
Apache License 2.0
2.02k stars 443 forks source link

Kafka SSL handhake failed #178

Closed KhoiDinh closed 4 years ago

KhoiDinh commented 4 years ago

I am trying to connect kafka monitor to my sasl_ssl configured kafka cluster. My kafka cluster currently works with sasl_ssl but kafka monitor keeps on failed. This is what I get when I try to run kafka monitor:

[2019-12-09 15:19:10,887] INFO MultiClusterTopicManagementServiceConfig values: topic = my-topic topic-management.preferred.leader.election.check.interval.ms = 300000 topic-management.rebalance.interval.ms = 600000 (com.linkedin.kmf.services.configs.MultiClusterTopicManagementServiceConfig) [2019-12-09 15:19:10,941] INFO TopicManagementServiceConfig values: topic = my-topic topic-management.minPartitionNum = 1 topic-management.partitionsToBrokersRatio = 2.0 topic-management.replicationFactor = 1 topic-management.topicCreationEnabled = true topic-management.topicFactory.class.name = com.linkedin.kmf.topicfactory.DefaultTopicFactory zookeeper.connect = 10.15.164.233:2181 (com.linkedin.kmf.services.configs.TopicManagementServiceConfig) [2019-12-09 15:19:11,005] INFO ProduceServiceConfig values: bootstrap.servers = 10.15.164.233:9093 produce.latency.percentile.granularity.ms = 1 produce.latency.percentile.max.ms = 5000 produce.partitioner.class = class com.linkedin.kmf.partitioner.NewKMPartitioner produce.producer.class = com.linkedin.kmf.producer.NewProducer produce.producer.id = kmf-producer produce.record.delay.ms = 100 produce.record.size.byte = 100 produce.sync = true produce.thread.num = 5 produce.treat.zero.throughput.as.unavailable = true topic = my-topic zookeeper.connect = 10.15.164.233:2181 (com.linkedin.kmf.services.configs.ProduceServiceConfig) [2019-12-09 15:19:11,146] INFO single-cluster-monitor/ProduceService is initialized. (com.linkedin.kmf.services.ProduceService) [2019-12-09 15:19:11,152] INFO ConsumeServiceConfig values: bootstrap.servers = 10.15.164.233:9093 consume.consumer.class = com.linkedin.kmf.consumer.NewConsumer consume.latency.percentile.granularity.ms = 1 consume.latency.percentile.max.ms = 5000 consume.latency.sla.ms = 20000 topic = my-topic zookeeper.connect = 10.15.164.233:2181 (com.linkedin.kmf.services.configs.ConsumeServiceConfig) [2019-12-09 15:19:11,283] INFO JettyServiceConfig values: jetty.port = 8000 (com.linkedin.kmf.services.configs.JettyServiceConfig) [2019-12-09 15:19:11,358] INFO DefaultMetricsReporterServiceConfig values: report.interval.sec = 1 report.metrics.list = [kmf:type=kafka-monitor:offline-runnable-count, kmf.services:type=produce-service,name=:produce-availability-avg, kmf.services:type=consume-service,name=:consume-availability-avg, kmf.services:type=produce-service,name=:records-produced-total, kmf.services:type=consume-service,name=:records-consumed-total, kmf.services:type=consume-service,name=:records-lost-total, kmf.services:type=consume-service,name=:records-lost-rate, kmf.services:type=consume-service,name=:records-duplicated-total, kmf.services:type=consume-service,name=:records-delay-ms-avg, kmf.services:type=produce-service,name=:records-produced-rate, kmf.services:type=produce-service,name=:produce-error-rate, kmf.services:type=consume-service,name=*:consume-error-rate] (com.linkedin.kmf.services.configs.DefaultMetricsReporterServiceConfig) [2019-12-09 15:19:11,360] INFO single-cluster-monitor/MultiClusterTopicManagementService started. (com.linkedin.kmf.services.MultiClusterTopicManagementService) [2019-12-09 15:19:11,361] INFO single-cluster-monitor/SingleClusterMonitor started. (com.linkedin.kmf.apps.SingleClusterMonitor) [2019-12-09 15:19:11,364] INFO jetty-8.1.19.v20160209 (org.eclipse.jetty.server.Server) [2019-12-09 15:19:11,384] INFO Started SelectChannelConnector@0.0.0.0:8000 (org.eclipse.jetty.server.AbstractConnector) [2019-12-09 15:19:11,384] INFO jetty-service/JettyService started at port 8000 (com.linkedin.kmf.services.JettyService) I> No access restrictor found, access to any MBean is allowed [2019-12-09 15:19:11,449] INFO jolokia-service/JolokiaService started at port 8778 (com.linkedin.kmf.services.JettyService) [2019-12-09 15:19:11,450] INFO reporter-service/DefaultMetricsReporterService started. (com.linkedin.kmf.services.DefaultMetricsReporterService) [2019-12-09 15:19:11,450] INFO KafkaMonitor started. (com.linkedin.kmf.KafkaMonitor)

[2019-12-09 15:19:16,450] ERROR App single-cluster-monitor is not fully running. (com.linkedin.kmf.KafkaMonitor) [2019-12-09 15:19:17,450] INFO ============================================================== kmf:type=kafka-monitor:offline-runnable-count=1.0 kmf.services:name=single-cluster-monitor,type=produce-service:produce-availability-avg=NaN kmf.services:name=single-cluster-monitor,type=consume-service:consume-availability-avg=0.0 kmf.services:name=single-cluster-monitor,type=produce-service:records-produced-total=0.0 kmf.services:name=single-cluster-monitor,type=consume-service:records-consumed-total=0.0 kmf.services:name=single-cluster-monitor,type=consume-service:records-lost-total=0.0 kmf.services:name=single-cluster-monitor,type=consume-service:records-lost-rate=0.0 kmf.services:name=single-cluster-monitor,type=consume-service:records-duplicated-total=0.0 kmf.services:name=single-cluster-monitor,type=consume-service:records-delay-ms-avg=NaN kmf.services:name=single-cluster-monitor,type=produce-service:records-produced-rate=0.0 kmf.services:name=single-cluster-monitor,type=produce-service:produce-error-rate=0.0 kmf.services:name=single-cluster-monitor,type=consume-service:consume-error-rate=0.0 (com.linkedin.kmf.services.DefaultMetricsReporterService) [2019-12-09 15:19:18,450] INFO ==============================================================

And this is what my kafka-monitor.properties looks like:

{
  "single-cluster-monitor": {
    "class.name": "com.linkedin.kmf.apps.SingleClusterMonitor",
    "topic": "my-topic",
    "zookeeper.connect": "10.15.164.233:2181",
    "bootstrap.servers": "10.15.164.233:9093",
    "request.timeout.ms": 9000,
    "produce.record.delay.ms": 100,
    "topic-management.topicCreationEnabled": true,
    "topic-management.replicationFactor" : 1,
    "topic-management.partitionsToBrokersRatio" : 2.0,
    "topic-management.rebalance.interval.ms" : 600000,
    "topic-management.preferred.leader.election.check.interval.ms" : 300000,
    "topic-management.topicFactory.props": {
    },
    "topic-management.topic.props": {
      "retention.ms": "3600000"
    },
    "produce.producer.props": {
      "client.id": "kmf-client-id",
      "security.protocol": "SASL_SSL",
      "sasl.mechanism": "PLAIN",
      "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"kafka-admin\" password=\"kafka-password\";",
      "ssl.truststore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.truststore.jks",
      "ssl.truststore.password": "password",
      "ssl.keystore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.keystore.jks",
      "ssl.keystore.password": "password"
    },

    "consume.latency.sla.ms": "20000",
    "consume.consumer.props": {
      "security.protocol": "SASL_SSL",
      "sasl.mechanism": "PLAIN",
      "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"kafka-admin\" password=\"kafka-password\";",
      "ssl.truststore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.truststore.jks",
      "ssl.truststore.password": "password",
      "ssl.keystore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.keystore.jks",
      "ssl.keystore.password": "password"
    }
  },

  "jetty-service": {
    "class.name": "com.linkedin.kmf.services.JettyService",
    "jetty.port": 8000
  },

  "jolokia-service": {
    "class.name": "com.linkedin.kmf.services.JolokiaService"
  },

  "reporter-service": {
    "class.name": "com.linkedin.kmf.services.DefaultMetricsReporterService",
    "report.interval.sec": 1,
    "report.metrics.list": [
      "kmf:type=kafka-monitor:offline-runnable-count",
      "kmf.services:type=produce-service,name=*:produce-availability-avg",
      "kmf.services:type=consume-service,name=*:consume-availability-avg",
      "kmf.services:type=produce-service,name=*:records-produced-total",
      "kmf.services:type=consume-service,name=*:records-consumed-total",
      "kmf.services:type=consume-service,name=*:records-lost-total",
      "kmf.services:type=consume-service,name=*:records-lost-rate",
      "kmf.services:type=consume-service,name=*:records-duplicated-total",
      "kmf.services:type=consume-service,name=*:records-delay-ms-avg",
      "kmf.services:type=produce-service,name=*:records-produced-rate",
      "kmf.services:type=produce-service,name=*:produce-error-rate",
      "kmf.services:type=consume-service,name=*:consume-error-rate"
    ]
  }

#  Example statsd-service to report metrics
#  "statsd-service": {
#      "class.name": "com.linkedin.kmf.services.StatsdMetricsReporterService",
#      "report.statsd.host": "localhost",
#      "report.statsd.port": "8125",
#      "report.statsd.prefix": "kafka-monitor",
#      "report.interval.sec": 1,
#      "report.metrics.list": [
#      "kmf:type=kafka-monitor:offline-runnable-count",
#      "kmf.services:type=produce-service,name=*:produce-availability-avg",
#      "kmf.services:type=consume-service,name=*:consume-availability-avg"
#     ]
#  }

#  Example kafka-service to report metrics
#  "reporter-kafka-service": {
#    "class.name": "com.linkedin.kmf.services.KafkaMetricsReporterService",
#    "report.interval.sec": 3,
#    "zookeeper.connect": "localhost:2181",
#    "bootstrap.servers": "localhost:9092",
#    "topic": "kafka-monitor-topic-metrics",
#    "report.kafka.topic.replication.factor": 1,
#    "report.metrics.list": [
#      "kmf.services:type=produce-service,name=*:produce-availability-avg",
#      "kmf.services:type=consume-service,name=*:consume-availability-avg",
#      "kmf.services:type=produce-service,name=*:records-produced-total",
#      "kmf.services:type=consume-service,name=*:records-consumed-total",
#      "kmf.services:type=consume-service,name=*:records-lost-total",
#      "kmf.services:type=consume-service,name=*:records-duplicated-total",
#      "kmf.services:type=consume-service,name=*:records-delay-ms-avg",
#      "kmf.services:type=produce-service,name=*:records-produced-rate",
#      "kmf.services:type=produce-service,name=*:produce-error-rate",
#      "kmf.services:type=consume-service,name=*:consume-error-rate"
#    ]
#  }

#  Example signalfx-service to report metrics
# "signalfx-service": {
#   "class.name": "com.linkedin.kmf.services.SignalFxMetricsReporterService",
#   "report.interval.sec": 1,
#   "report.metric.dimensions": {
#   },
#   "report.signalfx.url": "",
#   "report.signalfx.token" : ""
# }

}

How can I connect to my SASL_SSL kafka cluster? What configuration am I missing?

smccauliff commented 4 years ago

Can you post the stack trace?

KhoiDinh commented 4 years ago

stack trace of what?

smccauliff commented 4 years ago

If you look in here, https://github.com/linkedin/kafka-monitor/blob/master/src/main/java/com/linkedin/kmf/services/ProduceService.java , there are some places where stack traces will be logged. Having one of these might be useful for debugging this issue.

KhoiDinh commented 4 years ago

Here is the log for kafka-client.log:

[2019-12-09 16:05:15,609] WARN The configuration 'topic-management.preferred.leader.election.check.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'topic-management.replicationFactor' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'consume.consumer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'topic-management.rebalance.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'topic-management.partitionsToBrokersRatio' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'produce.record.delay.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'produce.producer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,609] WARN The configuration 'topic-management.topicFactory.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,610] WARN The configuration 'topic' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,610] WARN The configuration 'consume.latency.sla.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,610] WARN The configuration 'class.name' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,610] WARN The configuration 'topic-management.topic.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,610] WARN The configuration 'topic-management.topicCreationEnabled' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.preferred.leader.election.check.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.replicationFactor' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'consume.consumer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.rebalance.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.partitionsToBrokersRatio' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'produce.record.delay.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'produce.producer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.topicFactory.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'consume.latency.sla.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'class.name' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.topic.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,626] WARN The configuration 'topic-management.topicCreationEnabled' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-09 16:05:15,882] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385) at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572) at org.apache.kafka.common.network.Selector.poll(Selector.java:483) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:539) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1152) at java.lang.Thread.run(Thread.java:748) [2019-12-09 16:05:15,882] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-2': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385) at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572) at org.apache.kafka.common.network.Selector.poll(Selector.java:483) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:539) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1152) at java.lang.Thread.run(Thread.java:748) [2019-12-09 16:05:15,910] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig)

andrewchoi5 commented 4 years ago

Hello, Thanks for inquiry.

Have you tried checking that ConsumerConfig values configuration was properly applied? Also, could you please verify thatprefixed configs with the proper producer. or consumer. in the properties file? consumer.security.protocol=SASL_SSL https://stackoverflow.com/questions/52722987/kafka-connect-out-of-java-heap-space-after-enabling-ssl

andrewchoi5 commented 4 years ago

https://issues.apache.org/jira/browse/KAFKA-4090

KhoiDinh commented 4 years ago

Tried that and it didn't work. I realized then when I ran a command like: kafka-topics.sh --describe --bootstrap-server swe-analyticsdb-prod2:9093 --topic my-topic, I get the SSL handshake failure, like when I started up kafka monitor. What do I need to set so I can run that command above (assuming that it could be my problem)?

EDIT: Found out that passing in a properties file with the command works. How can I do that but for kafka-monitor?

andrewchoi5 commented 4 years ago

The command I execute for Kafka Monitor end to end test, which works for me, is

./gradlew clean build && ./gradlew jar && ./bin/end-to-end-test.sh --broker-list localhost:PORT_NUMBER --zookeeper localhost:PORT_NUMBER --topic TOPIC_NAME

andrewchoi5 commented 4 years ago

./bin/kafka-monitor-start.sh config/kafka-monitor.properties For multi-cluster monitoring, use ./bin/kafka-monitor-start.sh config/multi-cluster-monitor.properties

KhoiDinh commented 4 years ago

I ran the ./bin/kafka-monitor-start.sh config/kafka-monitor.properties command and that didn't work. I also tried the end to end test and it gave me this:

./bin/end-to-end-test.sh --broker-list swe-analyticsdb-prod2:9093 --zookeeper swe-analyticsdb-prod2:2181 --topic my-topic [2019-12-10 07:53:51,055] INFO MultiClusterTopicManagementServiceConfig values: topic = my-topic topic-management.preferred.leader.election.check.interval.ms = 300000 topic-management.rebalance.interval.ms = 600000 (com.linkedin.kmf.services.configs.MultiClusterTopicManagementServiceConfig) [2019-12-10 07:53:51,109] INFO TopicManagementServiceConfig values: topic = my-topic topic-management.minPartitionNum = 1 topic-management.partitionsToBrokersRatio = 1.0 topic-management.replicationFactor = 1 topic-management.topicCreationEnabled = true topic-management.topicFactory.class.name = com.linkedin.kmf.topicfactory.DefaultTopicFactory zookeeper.connect = swe-analyticsdb-prod2:2181 (com.linkedin.kmf.services.configs.TopicManagementServiceConfig) [2019-12-10 07:53:51,189] INFO ProduceServiceConfig values: bootstrap.servers = swe-analyticsdb-prod2:9093 produce.latency.percentile.granularity.ms = 1 produce.latency.percentile.max.ms = 5000 produce.partitioner.class = class com.linkedin.kmf.partitioner.NewKMPartitioner produce.producer.class = com.linkedin.kmf.producer.NewProducer produce.producer.id = kmf-producer produce.record.delay.ms = 100 produce.record.size.byte = 100 produce.sync = true produce.thread.num = 5 produce.treat.zero.throughput.as.unavailable = true topic = my-topic zookeeper.connect = swe-analyticsdb-prod2:2181 (com.linkedin.kmf.services.configs.ProduceServiceConfig) [2019-12-10 07:53:51,230] INFO single-cluster-monitor/ProduceService is initialized. (com.linkedin.kmf.services.ProduceService) [2019-12-10 07:53:51,235] INFO ConsumeServiceConfig values: bootstrap.servers = swe-analyticsdb-prod2:9093 consume.consumer.class = com.linkedin.kmf.consumer.NewConsumer consume.latency.percentile.granularity.ms = 1 consume.latency.percentile.max.ms = 5000 consume.latency.sla.ms = 20000 topic = my-topic zookeeper.connect = swe-analyticsdb-prod2:2181 (com.linkedin.kmf.services.configs.ConsumeServiceConfig) [2019-12-10 07:53:51,283] INFO single-cluster-monitor/MultiClusterTopicManagementService started. (com.linkedin.kmf.services.MultiClusterTopicManagementService) [2019-12-10 07:53:51,284] INFO single-cluster-monitor/SingleClusterMonitor started. (com.linkedin.kmf.apps.SingleClusterMonitor) [2019-12-10 07:53:51,285] INFO DefaultMetricsReporterServiceConfig values: report.interval.sec = 1 report.metrics.list = [kmf.services:type=produce-service,name=:produce-availability-avg, kmf.services:type=consume-service,name=:consume-availability-avg, kmf.services:type=produce-service,name=:records-produced-total, kmf.services:type=consume-service,name=:records-consumed-total, kmf.services:type=consume-service,name=:records-lost-total, kmf.services:type=consume-service,name=:records-lost-rate, kmf.services:type=consume-service,name=:records-duplicated-total, kmf.services:type=consume-service,name=:records-delay-ms-avg, kmf.services:type=produce-service,name=:records-produced-rate, kmf.services:type=produce-service,name=:produce-error-rate, kmf.services:type=consume-service,name=*:consume-error-rate] (com.linkedin.kmf.services.configs.DefaultMetricsReporterServiceConfig) [2019-12-10 07:53:51,285] INFO end-to-end/DefaultMetricsReporterService started. (com.linkedin.kmf.services.DefaultMetricsReporterService) I> No access restrictor found, access to any MBean is allowed [2019-12-10 07:53:51,509] INFO end-to-end/JolokiaService started at port 8778 (com.linkedin.kmf.services.JettyService) [2019-12-10 07:53:51,511] INFO JettyServiceConfig values: jetty.port = 8000 (com.linkedin.kmf.services.configs.JettyServiceConfig) [2019-12-10 07:53:51,545] INFO jetty-8.1.19.v20160209 (org.eclipse.jetty.server.Server) [2019-12-10 07:53:51,564] INFO Started SelectChannelConnector@0.0.0.0:8000 (org.eclipse.jetty.server.AbstractConnector) [2019-12-10 07:53:51,564] INFO end-to-end/JettyService started at port 8000 (com.linkedin.kmf.services.JettyService) [2019-12-10 07:53:51,564] ERROR Some services have stopped. (com.linkedin.kmf.apps.SingleClusterMonitor)

This is what my server log for kafka looked like after i ran the comand:

INFO [SocketServer brokerId=1] Failed authentication with swe-analyticsdb-prod2.internal.synopsys.com/ (SSL handshake failed) (org.apache.kafka.common.network.Selector)

From kafka-cliet.log::

[2019-12-10 08:43:34,712] WARN The configuration 'topic-management.preferred.leader.election.check.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic-management.replicationFactor' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'consume.consumer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic-management.rebalance.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic-management.partitionsToBrokersRatio' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'produce.record.delay.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'produce.producer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic-management.topicFactory.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'consume.latency.sla.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'class.name' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic-management.topic.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,712] WARN The configuration 'topic-management.topicCreationEnabled' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.preferred.leader.election.check.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.replicationFactor' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'consume.consumer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.rebalance.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.partitionsToBrokersRatio' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'produce.record.delay.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'produce.producer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.topicFactory.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'consume.latency.sla.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'class.name' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.topic.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,730] WARN The configuration 'topic-management.topicCreationEnabled' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:34,860] WARN The configuration 'ssl.client.auth' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig) [2019-12-10 08:43:35,171] WARN The configuration 'ssl.client.auth' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig) [2019-12-10 08:43:35,171] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic-management.preferred.leader.election.check.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic-management.replicationFactor' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'consume.consumer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic-management.rebalance.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic-management.partitionsToBrokersRatio' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'produce.record.delay.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'produce.producer.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'zookeeper.connect' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic-management.topicFactory.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'consume.latency.sla.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'class.name' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,176] WARN The configuration 'topic-management.topic.props' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig) [2019-12-10 08:43:35,177] WARN The configuration 'topic-management.topicCreationEnabled' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)

smccauliff commented 4 years ago

---> java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) [stack frames omitted] org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1152)

The OOM: The Kafka protocol looks for a length field at the beginning of the request or response. When the remote end point is talking TLS and the local (client) is talking plaintext the plaintext end looks at the bytes returned by TLS. These bytes are the length field. The bytes are converted to an integer. A very large integer. The client attempts to allocate this buffer. The client gets OutOfMemoryError.

KafkaAdminClient: There is another stackframe which indicates this is from the KafkaAdminClient.

Conclusion: The AdminClient must not be configured for SSL/TLS.

smccauliff commented 4 years ago

We've recently changed to using the KafkaAdminClient rather than ZKUtils as KIP-500 means Kafka will no longer use Zookeeper in the future and ZKUtils is deprecated.

KhoiDinh commented 4 years ago

how do I configure the adminclient for the kafka instance? im using kafka 2.2.0. I have the adminclient config file and can pass it to the command when run console operation, such as "kafka-topics.sh --describe --bootstrap-server swe-analyticsdb-prod2:9093 --topic my-topic --command-config <properties file" and it works. How can I do it to be able to start up kafka monitor?

I have specified the jaas config inside the kafka-monitor.properties file inside consume.consumer.props but based on the client log, the consume.consumer.props isn't a valid field

andrewchoi5 commented 4 years ago
  "single-cluster-monitor": {
    "class.name": "com.linkedin.kmf.apps.SingleClusterMonitor",
    "topic": "my-topic",
    "zookeeper.connect": "10.15.164.233:2181",
    "bootstrap.servers": "10.15.164.233:9093",
    "request.timeout.ms": 9000,
    "produce.record.delay.ms": 100,
    "topic-management.topicCreationEnabled": true,
    "topic-management.replicationFactor" : 1,
    "topic-management.partitionsToBrokersRatio" : 2.0,
    "topic-management.rebalance.interval.ms" : 600000,
    "topic-management.preferred.leader.election.check.interval.ms" : 300000,
    "topic-management.topicFactory.props": {
    },
    "topic-management.topic.props": {
      "retention.ms": "3600000"
    },
    "produce.producer.props": {
      "client.id": "kmf-client-id",
      "security.protocol": "SASL_SSL",
      "sasl.mechanism": "PLAIN",
      "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"kafka-admin\" password=\"kafka-password\";",
      "ssl.truststore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.truststore.jks",
      "ssl.truststore.password": "password",
      "ssl.keystore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.keystore.jks",
      "ssl.keystore.password": "password"
    },

    "consume.latency.sla.ms": "20000",
    "consume.consumer.props": {
      "security.protocol": "SASL_SSL",
      "sasl.mechanism": "PLAIN",
      "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"kafka-admin\" password=\"kafka-password\";",
      "ssl.truststore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.truststore.jks",
      "ssl.truststore.password": "password",
      "ssl.keystore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.keystore.jks",
      "ssl.keystore.password": "password"
    }
  }

Examining your config file, it seems as though you configured produce.producer.configs and consume.consumer.configs but haven't configured adminClientConfigs. The security protocol used between the Kafka Monitor adminClient and your Kafka broker(s) should be identical, and not PLAINTEXT

andrewchoi5 commented 4 years ago
  "single-cluster-monitor": {
    "class.name": "com.linkedin.kmf.apps.SingleClusterMonitor",
    "topic": "my-topic",
    "zookeeper.connect": "10.15.164.233:2181",
    "bootstrap.servers": "10.15.164.233:9093",
    "request.timeout.ms": 9000,
    "produce.record.delay.ms": 100,
    "topic-management.topicCreationEnabled": true,
    "topic-management.replicationFactor" : 1,
    "topic-management.partitionsToBrokersRatio" : 2.0,
    "topic-management.rebalance.interval.ms" : 600000,
    "topic-management.preferred.leader.election.check.interval.ms" : 300000,
    "topic-management.topicFactory.props": {
    },
    "topic-management.topic.props": {
      "retention.ms": "3600000"
    },
    "produce.producer.props": {
      "client.id": "kmf-client-id",
      "security.protocol": "SASL_SSL",
      "sasl.mechanism": "PLAIN",
      "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"kafka-admin\" password=\"kafka-password\";",
      "ssl.truststore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.truststore.jks",
      "ssl.truststore.password": "password",
      "ssl.keystore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.keystore.jks",
      "ssl.keystore.password": "password"
    },

    "consume.latency.sla.ms": "20000",
    "consume.consumer.props": {
      "security.protocol": "SASL_SSL",
      "sasl.mechanism": "PLAIN",
      "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"kafka-admin\" password=\"kafka-password\";",
      "ssl.truststore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.truststore.jks",
      "ssl.truststore.password": "password",
      "ssl.keystore.location": "/remote/sde108/kafka/kafka/SSL2/client/client.keystore.jks",
      "ssl.keystore.password": "password"
    }
  }

Examining your config file, it seems as though you configured produce.producer.configs and consume.consumer.configs but haven't configured adminClientConfigs. The security protocol used between the Kafka Monitor adminClient and your Kafka broker(s) should be identical, and not PLAINTEXT

AdminClient Configurations: https://docs.confluent.io/current/installation/configuration/admin-configs.html

KhoiDinh commented 4 years ago

could you give me an example of how to configure the adminclient inside the config file? I'm not sure what I am suppose to do

andrewchoi5 commented 4 years ago

could you give me an example of how to configure the adminclient inside the config file? I'm not sure what I am suppose to do

We have an open, pending Pull Request that configures SSL for you.