yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters
Apache License 2.0
11.8k stars 2.5k forks source link

Yikes! Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] #148

Open henry20100102 opened 8 years ago

henry20100102 commented 8 years ago

The following error will happen at log file when open topic link or others. could you help me? thanks.


[error] k.m.ApiError - error : Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) ~[com.typesafe.akka.akka-actor_2.11-2.3.10.jar:na] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[com.typesafe.akka.akka-actor_2.11-2.3.10.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[org.scala-lang.scala-library-2.11.7.jar:na] [error] k.m.ApiError - error : Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) ~[com.typesafe.akka.akka-actor_2.11-2.3.10.jar:na] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[com.typesafe.akka.akka-actor_2.11-2.3.10.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[org.scala-lang.scala-library-2.11.7.jar:na]

josephfrancis commented 8 years ago

Wondering if your issues is similar to mine: https://github.com/yahoo/kafka-manager/issues/143

I fixed it by enabling Active Offset Cache thats radiobox option when modifying clusters

henry20100102 commented 8 years ago

thanks. It works by enabling Active offset Cache

Jonathan-Wei commented 8 years ago

Hello,How to enabling Active offset Cache?

wyzssw commented 8 years ago

How to enabling Active offset Cache @josephfrancis ,please help me

josephfrancis commented 8 years ago

screen shot 2015-11-09 at 15 22 44

LeePorte commented 8 years ago

I'm seeing the same issue even with Active offset cache enabled. I have the same config as @josephfrancis.

Any thoughts?

Jonathan-Wei commented 8 years ago

Is it kafka manager can be used for existing kafka cluster?@josephfrancis.

LeePorte commented 8 years ago

@Jonathan-Wei As I see it yes, it can only be used for an existing cluster, as it lacks the options to create a new one.

Unless I misunderstood the question.

Jonathan-Wei commented 8 years ago

Thank you! @LeePorte I can use it now!

wyzssw commented 8 years ago

Thanks @josephfrancis

irasit commented 8 years ago

Not working for me. I still see the same error when I click topic list or consumer even enabled the Active offset Cache.

image

image

[error] k.m.ApiError$ - error : Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[org.scala-lang.scala-library-2.11.7.jar:na] at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]

wookasz commented 8 years ago

Seeing this as well.

montana-ua commented 8 years ago

I have the same error Yikes! Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms]

Kafka Manager - 1.3.0.4 Apache Kafka 0.8.2.1, 0.8.2.2 Zookeeper 3.4.6, 3.4.8 OS - RHEL 6.7 Starup string - ./kafka-manager -Dconfig.file="/opt/kafka-manager/conf/application.conf" -Dhttp.port=9001 -Dapplication.home="/opt/kafka-manager/" -java-home /opt/java/jdk8

My issue is very strange. I have 3 hosts (MBP OS X 10.11.3, VM1 RHEL 6.7, VM2 RHEL 6.7) and all of them has been deployed with the same kafka-manager-1.3.0.4.zip file. All instances of Kafka Manager are connected to the same zookeeper instance. RHEL 6.7 was deployed with the same VM template and differ only IPv4 address. I'm facing issue only with VM1 RHEL 6.7 but two others (OS X and VM2 RHEL 6.7) works without any errors.

Please help me to define steps to debug the issue.

Exception: [error] k.m.ApiError$ - error : Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager/kdc_cluster_ss1/kafka-state)]] after [2000 ms] akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager/kdc_cluster_ss1/kafka-state)]] after [2000 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[org.scala-lang.scala-library-2.11.7.jar:na] at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[org.scala-lang.scala-library-2.11.7.jar:na] at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]

montana-ua commented 8 years ago

I have cloned the VM2 RHEL 6.7 and reconfigured it same as VM1 RHEL 6.7 and after that Kafka Manager works without any exceptions.

vladimir4862 commented 8 years ago

the same problem persists with me, none of the above does not help to solve it. error occurs when you try to view the consumers. launched on vagrant with apashe-kafka quickstart vagrant version=1.7.2 vagrant box=bento/centos-7.1 java=1.0.8_71

anyone can tell what I'm doing wrong?

dhoppe commented 8 years ago

I have an identical setup for preproduction and production. At preproduction everything is working as expected, but when I use the link for Broker List or Topic List at production, the following error message shows up.

2016-03-04 08:24:54,917 - [ERROR] - from kafka.manager.ApiError$ in pool-1-thread-1
error : Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms]
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms]
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
    at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
    at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[org.scala-lang.scala-library-2.11.7.jar:na]
    at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[org.scala-lang.scala-library-2.11.7.jar:na]
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[org.scala-lang.scala-library-2.11.7.jar:na]
    at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
    at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
    at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
    at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72-internal]

The only difference is that the link for Broker List will show the brokers without the metrics for bytes and messages etc and the link for Topic List will show a error message.

Yikes! Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms]

Is it possible that these timeouts are related to the size of the Kafka topics? This could be an explanation why this is working for preproduction, but not for production.

I am using the following software versions: Kafka: 0.8.2.2 Kafka-Manager: 1.3.0.4 ZooKeeper: 3.4.5

dhoppe commented 8 years ago

This is weird. Forget what I said. I just moved Kafka-Manager to another EC2 instance and now everything is working as expected.

sdhzlzhk commented 8 years ago

+1

lauteb commented 8 years ago

could it be that kafka-manager.zkhosts= is not correctly set in conf/application.conf?

montana-ua commented 8 years ago

It was correct in my case.

jianchen2580 commented 8 years ago

+1, v 1.3.0.4

ayiis commented 8 years ago

I got a similar problem.

It seems that once an "ask timeout" occoured to a cluster, this cluster will always be "ask timeout" (other cluster is still available), unless I restart the KM.

Simple way to duplicate an "ask timeout":

  1. open KMhost/clusters/{cluster}/consumers/{group_id}/topic/{topic_name}/type/KF
  2. press F5 (refresh) and hold F5 for a few seconds (refresh 100+ times)

Then an "ask timeout" come out, and never will this cluster be availbale again unless I restart KM.

Am I mistaking something?

csghuser commented 8 years ago

+1, similar issue here. v1.3.0.8-1

It works fine for a while, but eventually it seems to hit this problem and never recover. I have to restart the service for it to start working again, and then it works fine right away.

[ESC[31merrorESC[0m] k.m.ApiError$ - error : Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms]
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://kafka-manager-system/), Path(/user/kafka-manager)]] after [5000 ms]
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[org.scala-lang.scala-library-2.11.7.jar:na]
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[org.scala-lang.scala-library-2.11.7.jar:na]
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[org.scala-lang.scala-library-2.11.7.jar:na]
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
        at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
        at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
        at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) ~[com.typesafe.akka.akka-actor_2.11-2.3.14.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
zhjt80 commented 8 years ago

+1, reproduced in 1.3.0.8 it seems to hit this problem and never recover. I have to restart the service for it to start working again, and then it works fine right away.

DavidLiuXh commented 8 years ago

My suggest is to check the network conditions to zk and kafka cluster.

falmp commented 8 years ago

Yeah, same here. I have Active Offset Cache enabled and sometimes I get this timeout issue, which is only restored after a restart of the service.

shrikantpatel commented 8 years ago

i used to get same error, i did not have zookeeper host configured correctly. there 2 ways you can set this either by command line

kafka-manager -Dkafka-manager.zkhosts="localhost:2181" -Dhttp.port=9999 or via application.conf property file. kafka-manager.zkhosts="kafka-manager-zookeeper:2181" # this is default value, change it to point to zk instance.

csghuser commented 8 years ago

Still getting this, works fine for days, maybe even a week, but they you will hit it and it will never recover.....

AIchovy commented 7 years ago

i get the same error,just add offsetCacheThreadPoolSize this parameter 2 to 5,then i fixed.

Laxman-SM commented 7 years ago

@mosesyou you are correct after increasing offsetCacheThreadPoolSize parameter 2 to 5. this issue fixed. +1

dovka commented 7 years ago

I get the 5000 sec timeout pretty frequently and it never recovers unless kafka manager process is restarted.

in Zookeeper logs I see error corresponding by time to the first timeout:

2017-01-16 17:35:51,923 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@649] - Got user-level KeeperException when processing sessionid:0x59a827ff1d0000 type:create cxid:0x18 zxid:0x10000008f txntype:-1 reqpath:n/a Error Path:/kafka-manager/mutex/locks Error:KeeperErrorCode = NoNode for /kafka-manager/mutex/locks

I understand the path /kafka-manager/mutex/locks in zookeeper doesn't exist, is that normal? Does kafka manager handles that or it's a symptom of the issue?

update: Upgraded to latest version 1.3.1.8, (was 1.3.0.4),

this DID NOT solve the timeouts issue.

DavidLiuXh commented 7 years ago

Use Future[PartitionOffsetsCapture]] in KafkaStateActor.scala send OffsetRequest to periodicity get last offset from broker, but the action possible timeout lead to the bug. set a timeout parameter in 'val partitionOffsets: Option[PartitionOffsetsCapture] = Await.ready()' codes in the getTopicPartitionIdentity in ActorModel.scala to fixed the bug, note that this Await.ready is used three times in ActorModel.scala file.

Laxman-SM commented 7 years ago

i am using both 1.3.2.1 and kafka-manager-1.3.3.1 compiled version from latest master branch. now kafka 10 supported on kafka-manager-1.3.3.1 version.

meinac commented 7 years ago

I had run in the same issue and all of the workarounds above did not worked for me. The strange thing is that same compiled code is working on OS X but not working on Ubuntu. I've double/triple/quadruple... checked configuration ¯(ツ)/¯. What could be the reason for facing with this problem.

DavidLiuXh commented 7 years ago

@meinac Please show your log.

meinac commented 7 years ago

I had disabled the Poll Consumer Information and Enable Active OffsetCache options and it worked.

patelh commented 7 years ago

Please try 1.3.3.4, i fixed the blocking call for producer offset polling which was causing timeout.

SiddheshDhoke commented 7 years ago

Restarting kafka-manager resolved my issue. Below link will help Starting the service

$ bin/kafka-manager By default, it will choose port 9000. This is overridable, as is the location of the configuration file. For example: $ bin/kafka-manager -Dconfig.file=/path/to/application.conf -Dhttp.port=8080 Again, if java is not in your path, or you need to run against a different version of java, add the -java-home option as follows: $ bin/kafka-manager -java-home /usr/local/oracle-java-8

URL for refer : https://github.com/yahoo/kafka-manager

haitaoyao commented 7 years ago

+1 And restart does not work sometimes. Still have no clue about how to configure to avoid this error.

chiwah-keen commented 7 years ago

i get the same error,just add offsetCacheThreadPoolSize this parameter 8 to 16,then fixed.

salah-cher commented 6 years ago

Had the same isssue Restarted many times and did all the config above Just changed the default port from 9000 to run with this option $ bin/kafka-manager -Dconfig.file=/path/to/application.conf -Dhttp.port=8080

it works now

marcosArruda commented 6 years ago

All those solutions imply on kafka manager or the ThreadPools being restarted. This is not a the solution. We have latest version here with kafka 0.11.0.0 and the problem still occurs. Maybe increase the timeout time from 5000 to 10000 would solve this. The size of the kafka cluster and topics quantity matter for this problem to occurs.

If dont know scala very well. If I do, I would already started to fix this.

maradwan commented 6 years ago

Also make sure the port of jmx is reachable to kafka-manager, otherwise it will not work

DavidLiuXh commented 6 years ago

Please try the pull request https://github.com/yahoo/kafka-manager/pull/456

bijugs commented 6 years ago

We have built kafka-manager with #456 and still seeing the issue of timeout when trying to an ZK quorum of an existing Kafka cluster with large number of topics/partitions. Any other pointers to resolve this issue?

DavidLiuXh commented 6 years ago

@bijugs get topic offsets is not related to zk, Can you provide more details?

bijugs commented 6 years ago

@DavidLiuXh .. here are the details of the issue we are facing. Let me know if you see any issues with the set-up or suggestions to fix.

DavidLiuXh commented 6 years ago

@bijugs try use ip to replace zk hostname and monitor your zk

jinleileiking commented 6 years ago

restart kafka-manager problem solved.,..

myusuf3 commented 6 years ago

i am still at a loss as to what the issue is, I have never seen any consumer data here, ever. I tried all combinations of settings etc.