Closed rahulchak closed 4 years ago
@rahulchak Thanks for using Kafka lag exporter. Can you please confirm what version you're using and attach a snippet of the log with the exception stack trace?
Thank you for your quick response. I believe the version is - 0.5.5 and i have attached the log here. I am not sure how to get the exception stack trace though. Below is the contents of my conf file
*conf file***** kafka-lag-exporter { port = 9080 kafka-client-timeout = 20 clusters = [ { name = "stormcloud-devtest" bootstrap-brokers = "test.com:9092" ssl.keystore.location = "/var/ssl/private/star.datafabric.gcp.west.com.pkcs12" ssl.key.password = "" ssl.keystore.password = "" ssl.truststore.location = "/var/ssl/private/truststore.pkcs12" ssl.truststore.password = "" sasl.mechanism = "PLAIN" sasl.jaas.config = "org.apache.kafka.common.security.plain.PlainLoginModule required username='""' password='"***"';"
labels = {
location = "ny"
zone = "us-west1"
}
dmin-client-properties = {
client.id = "admin-client-id"
ssl.keystore.location = "/var/ssl/private/star.datafabric.gcp.west.com.pkcs12"
ssl.key.password = ""*******""
ssl.keystore.password = ""*******""
ssl.truststore.location = "/var/ssl/private/truststore.pkcs12"
ssl.truststore.password = ""*******""
sasl.mechanism = "PLAIN"
sasl.jaas.config = "org.apache.kafka.common.security.plain.PlainLoginModule required username='"*******"' password='"*******"';"
}
}
] }
***** Log **** kafkalogexporter_log.txt
Having a similar error, but connected to Cloud Karafka from our EKS through VPC:
Version: 0.5.5
Stacktrace:
Dec 13 14:08:18 kafka-lag-exporter-768d88df9f-hhzf2 kafka-lag-exporter INFO c.l.k.ConsumerGroupCollector$ akka://kafka-lag-exporter/user/consumer-group-collector-app - Collecting offsets
Dec 13 14:08:28 kafka-lag-exporter-768d88df9f-hhzf2 kafka-lag-exporter INFO o.a.k.c.a.i.AdminMetadataManager - [AdminClient clientId=adminclient-598] Metadata update failed org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
Dec 13 14:08:28 kafka-lag-exporter-768d88df9f-hhzf2 kafka-lag-exporter ERROR c.l.k.ConsumerGroupCollector$ akka://kafka-lag-exporter/user/consumer-group-collector-app - Supervisor RestartSupervisor saw failure: A failure occurred while retrieving offsets. Shutting down. java.lang.Exception: A failure occurred while retrieving offsets. Shutting down.
at com.lightbend.kafkalagexporter.ConsumerGroupCollector$CollectorBehavior.$anonfun$collector$1(ConsumerGroupCollector.scala:188)
at akka.actor.typed.internal.BehaviorImpl$ReceiveBehavior.receive(BehaviorImpl.scala:37)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:437)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:393)
at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:52)
at akka.actor.typed.internal.RestartSupervisor.aroundReceive(Supervision.scala:248)
at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:79)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:437)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:393)
at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:121)
at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:102)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:612)
at akka.actor.ActorCell.invoke(ActorCell.scala:581)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
at akka.dispatch.Mailbox.run(Mailbox.scala:229)
at akka.dispatch.Mailbox.exec(Mailbox.scala:241)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.TimeoutException: null
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at com.lightbend.kafkalagexporter.KafkaClient$.$anonfun$kafkaFuture$1(KafkaClient.scala:50)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Dec 13 14:08:29 kafka-lag-exporter-768d88df9f-hhzf2 kafka-lag-exporter INFO c.l.k.ConsumerGroupCollector$ akka://kafka-lag-exporter/user/consumer-group-collector-app - Spawned ConsumerGroupCollector for cluster: app
Dec 13 14:08:29 kafka-lag-exporter-768d88df9f-hhzf2 kafka-lag-exporter INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [**********:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = kafkalagexporter
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 10000
retry.backoff.ms = 1000
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
This issue was PEBKAC, AKA my fault. Was a kafka server IP mismatch for environments when I was upgrading with helm to change IPs for clusters. 🤦♂
same error with kafka version 1.0.1
@rahulchak @ccotar What version of Kafka (or Confluent Platform Kafka) clusters are you connecting to? I've only tested Kafka Lag Exporter with a limited number of broker versions, all of which are > 2.0.0.
@seglo my issue was my own fault, somehow had the wrong cluster IP during help upgrade. My bad. 👍
@ccotar I'm glad you got it resolved. Thanks for following up.
@seglo have the same issue with kafka 2.0.0 I have a single cluster in my test with name + bootstrap-servers parameters. On the same workstation I can run a simple Confluent SDK based app with the same server name.
Hi @seglo , Please let me know, is there any update on the same? Faced the same issue.
Here is the conf file that I am using
kafka-lag-exporter {
reporters.prometheus.port = 8080
poll-interval = 30 seconds
lookup-table-size = 60
clusters = [
{
name = "confluent-cluster-1"
bootstrap-brokers = "confluent-cluster-endpoint:9092"
consumer-properties = {
ssl.endpoint.identification.algorithm = "https"
sasl.mechanism = "PLAIN"
retry.backoff.ms = "500"
sasl.jaas.config = "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"******\" password=\"*******\";"
security.protocol = "SASL_SSL"
client.id = "devops--consumerlag_exporter--all"
}
labels = {
location = "sgp"
zone = "ap-southeast-1"
}
}
]
kafka-client-timeout = 30 seconds
metric-whitelist = [".*"]
}
***** Log ****
2020-08-27 10:58:06,059 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2020-08-27 10:58:06,161 INFO akka.actor.typed.Behavior akka://kafka-lag-exporter/user - Starting Kafka Lag Exporter with configuration:
Poll interval: 30 seconds
Lookup table size: 60
Prometheus metrics endpoint port: 8000
Admin client consumer group id: kafkalagexporter
Kafka client timeout: 30 seconds
Statically defined Clusters:
Cluster name: confluent-cluster-1
Cluster Kafka bootstrap brokers: confluent-cluster-endpoint:9092
Watchers:
Strimzi: false
2020-08-27 10:58:06,242 INFO akka.actor.typed.Behavior akka://kafka-lag-exporter/user - Cluster Added: KafkaCluster(confluent-cluster-1,confluent-cluster-endpoint:9092)
2020-08-27 10:58:06,253 INFO akka.actor.typed.Behavior akka://kafka-lag-exporter/user/consumer-group-collector-confluent-cluster-1 - Spawned ConsumerGroupCollector for cluster: confluent-cluster-1
2020-08-27 10:58:06,273 INFO o.a.k.c.admin.AdminClientConfig - AdminClientConfig values:
bootstrap.servers = [confluent-cluster-endpoint:9092]
client.dns.lookup = default
client.id =
connections.max.idle.ms = 300000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 1000
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
2020-08-27 10:58:06,352 INFO o.a.kafka.common.utils.AppInfoParser - Kafka version : 2.1.0
2020-08-27 10:58:06,352 INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId : eec43959745f444f
2020-08-27 10:58:36,357 INFO o.a.k.c.a.i.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
2020-08-27 10:58:36,359 INFO o.a.k.c.a.i.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited.
2020-08-27 10:58:36,368 INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [confluent-cluster-endpoint:9092]
check.crcs = true
client.dns.lookup = default
client.id =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = kafkalagexporter
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 1000
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
2020-08-27 10:58:36,401 INFO o.a.kafka.common.utils.AppInfoParser - Kafka version : 2.1.0
2020-08-27 10:58:36,401 INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId : eec43959745f444f
2020-08-27 10:58:36,411 ERROR akka.actor.typed.Behavior akka://kafka-lag-exporter/user/consumer-group-collector-confluent-cluster-1 - Supervisor RestartSupervisor saw failure: A failure occurred while retrieving offsets. Shutting down. java.lang.Exception: A failure occurred while retrieving offsets. Shutting down.
at com.lightbend.kafkalagexporter.ConsumerGroupCollector$.$anonfun$collector$1(ConsumerGroupCollector.scala:125)
at akka.actor.typed.internal.BehaviorImpl$ReceiveBehavior.receive(BehaviorImpl.scala:34)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:421)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:394)
at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:50)
at akka.actor.typed.internal.RestartSupervisor.aroundReceive(Supervision.scala:229)
at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:74)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:421)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:394)
at akka.actor.typed.internal.adapter.ActorAdapter.akka$actor$typed$internal$adapter$ActorAdapter$$handleMessage(ActorAdapter.scala:82)
at akka.actor.typed.internal.adapter.ActorAdapter$$anonfun$running$1.applyOrElse(ActorAdapter.scala:78)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:39)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.KafkaException: Failed to find brokers to send ListGroups
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:274)
at com.lightbend.kafkalagexporter.KafkaClient$.$anonfun$kafkaFuture$1(KafkaClient.scala:44)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.KafkaException: Failed to find brokers to send ListGroups
at org.apache.kafka.clients.admin.KafkaAdminClient$22.handleFailure(KafkaAdminClient.java:2610)
at org.apache.kafka.clients.admin.KafkaAdminClient$Call.fail(KafkaAdminClient.java:614)
at org.apache.kafka.clients.admin.KafkaAdminClient$TimeoutProcessor.handleTimeouts(KafkaAdminClient.java:730)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.timeoutPendingCalls(KafkaAdminClient.java:798)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1092)
... 1 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
2020-08-27 10:58:37,498 INFO akka.actor.typed.Behavior akka://kafka-lag-exporter/user/consumer-group-collector-confluent-cluster-1 - Spawned ConsumerGroupCollector for cluster: confluent-cluster-1
2020-08-27 10:58:37,505 INFO o.a.k.c.admin.AdminClientConfig - AdminClientConfig values:
bootstrap.servers = [confluent-cluster-endpoint:9092]
client.dns.lookup = default
client.id =
connections.max.idle.ms = 300000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 1000
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
2020-08-27 10:58:37,509 INFO o.a.kafka.common.utils.AppInfoParser - Kafka version : 2.1.0
2020-08-27 10:58:37,509 INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId : eec43959745f444f
Same here with Confluent Cloud. We opened #142 almost a month ago. Somehow it got back to work, but now isn't working again. It happen in one Confluent cluster in our case, we other that kafka-lag-exporter works fine... Thank you
@afiffing The title of this issue is the generic exception message when collecting offsets fails in Kafka Lag Exporter. It can fail for many different reasons. In your case the logs indicate the underlying exception is a org.apache.kafka.common.KafkaException: Failed to find brokers to send ListGroups
, which may be related to a Kafka broker listener misconfiguration.
I'm closing this issue due to its generality. If you encounter this exception look at the inner exception included in the stack trace to see the root cause.
When I exec into the pod, I see application.conf as this:
kafka-lag-exporter { port = 8000 poll-interval = 30 seconds lookup-table-size = 60 client-group-id = "kafkalagexporter" kafka-client-timeout = 10 seconds clusters = [ { name = "lkc-0xr722" bootstrap-brokers = "xxx" consumer-properties = { sasl = "map[jaas:map[config:org.apache.kafka.common.security.plain.PlainLoginModule required username='LDMGW4BF4LOA5Z4I' password='xxx';] mechanism:PLAIN]" security = "map[protocol:SASL_SSL]" ssl = "map[endpoint:map[identification:map[algorithm:https]]]" } admin-client-properties = { sasl = "map[jaas:map[config:org.apache.kafka.common.security.plain.PlainLoginModule required username='LDMGW4BF4LOA5Z4I' password='xxx';] mechanism:PLAIN]" security = "map[protocol:SASL_SSL]" ssl = "map[endpoint:map[identification:map[algorithm:https]]]" } labels = { } } ] reporters.prometheus.port = 8000 sinks = ["PrometheusEndpointSink"] watchers = { strimzi = "false" } metric-whitelist = [ ".*" ] }
akka { loggers = ["akka.event.slf4j.Slf4jLogger"] loglevel = "DEBUG" logging-filter = "akka.event.slf4j.Slf4jLoggingFilter" }
Does this look normal? The normal application.conf file should be like:
name = "lkc-0xr722"
bootstrap-brokers = "xxx"
security.protocol = "SASL_SSL"
sasl.mechanism = "PLAIN"
sasl.jaas.config = "org.apache.kafka.common.security.plain.PlainLoginModule required username='LDMGW4BF4LOA5Z4I' password='xxx';"
...
...
...
When I exec into the pod, I see application.conf as this:
kafka-lag-exporter { port = 8000 poll-interval = 30 seconds lookup-table-size = 60 client-group-id = "kafkalagexporter" kafka-client-timeout = 10 seconds clusters = [ { name = "lkc-0xr722" bootstrap-brokers = "xxx" consumer-properties = { sasl = "map[jaas:map[config:org.apache.kafka.common.security.plain.PlainLoginModule required username='LDMGW4BF4LOA5Z4I' password='xxx';] mechanism:PLAIN]" security = "map[protocol:SASL_SSL]" ssl = "map[endpoint:map[identification:map[algorithm:https]]]" } admin-client-properties = { sasl = "map[jaas:map[config:org.apache.kafka.common.security.plain.PlainLoginModule required username='LDMGW4BF4LOA5Z4I' password='xxx';] mechanism:PLAIN]" security = "map[protocol:SASL_SSL]" ssl = "map[endpoint:map[identification:map[algorithm:https]]]" } labels = { } } ] reporters.prometheus.port = 8000 sinks = ["PrometheusEndpointSink"] watchers = { strimzi = "false" } metric-whitelist = [ ".*" ] }
akka { loggers = ["akka.event.slf4j.Slf4jLogger"] loglevel = "DEBUG" logging-filter = "akka.event.slf4j.Slf4jLoggingFilter" }
Does this look normal? The normal application.conf file should be like:
name = "lkc-0xr722" bootstrap-brokers = "xxx" security.protocol = "SASL_SSL" sasl.mechanism = "PLAIN" sasl.jaas.config = "org.apache.kafka.common.security.plain.PlainLoginModule required username='LDMGW4BF4LOA5Z4I' password='xxx';" ... ... ...
I was able to fix this issue by fixing the configmap.
We are using Confluent Kafka secure cluster and I provided in the application.conf the section admin-client-properties with ssl key & trust store file locations and passwords, however when it starts all the ssl parameters are null, seems like it is not reading the admin-client-properties section.