Closed matanbaruch closed 4 years ago
The AdminClient
timed out when listing groups, but it looks like it was beginning to recover near the end of the log. Is this a consistent issue or do you only see it occasionally?
2020-10-01 11:49:41,946 ERROR c.l.k.ConsumerGroupCollector$ akka://kafka-lag-exporter/user/consumer-group-collector-KafkaProdOregon - Supervisor RestartSupervisor saw failure: A failure occurred while retrieving offsets. Shutting down. java.lang.Exception: A failure occurred while retrieving offsets. Shutting down.
at com.lightbend.kafkalagexporter.ConsumerGroupCollector$CollectorBehavior.$anonfun$collector$1(ConsumerGroupCollector.scala:214)
at akka.actor.typed.internal.BehaviorImpl$ReceiveBehavior.receive(BehaviorImpl.scala:136)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57)
at akka.actor.typed.internal.RestartSupervisor.aroundReceive(Supervision.scala:263)
at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:85)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:129)
at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:106)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:577)
at akka.actor.ActorCell.invoke(ActorCell.scala:547)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1601552980162) timed out at 1601552980163 after 1 attempt(s)
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at com.lightbend.kafkalagexporter.KafkaClient$.$anonfun$kafkaFuture$1(KafkaClient.scala:50)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1601552980162) timed out at 1601552980163 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
This is consistent issue. I can't event get the metrics. The Prometheus endpoint only serve JVM Metrics.
What I meant was does it ever produce Kafka Lag Exporter metrics or is it never able to connect to the Kafka cluster? If it always times out then I would suggest troubleshooting that connectivity issue.
It never produce Lag Exporter metrics.. There is no connection issue, I’m running different exporters from the same machine all of them works fine.
Anyone?
Can you include more logs? Do they ever indicate that the consumer or admin client even successfully connected to your cluster?
The log file is included in the main post. I raised to debug level.
It looks like they never successfully connected to the cluster. The cluster works with non ACL.
Unfortunately I don't have much experience configuring ACLs with Kafka clients, but other have used Kafka Lag Exporter successfully in secured environments.
Based on some cursory google searches for Call(callName=findCoordinator, deadlineMs=[timeout]) timed out
it seems the problem is generally due to client configuration errors. Or possibly misconfiguration in the brokers advertised listeners, but that's probably not the case for you since you say other clients can connect to the cluster fine. I would carefully check the config of those other clients with what you're providing to Kafka Lag Exporter to see where the difference might be.
Kafka version 2.4.1.1 (AWS MSK) kafka-lag-exporter version 0.6.4 Debug enabled Attached logs.txt
application.conf