strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.86k stars 1.3k forks source link

TO has a blocked thread [vert.x-eventloop-thread-0,5,main] after deploy #1030

Closed serrss closed 5 years ago

serrss commented 6 years ago

TO has WARNING about a blocked thread in log for 30 seconds after deploying.

[2018-10-25 12:51:25,338] INFO  <ZooKeeper   :442> [oop-thread-0] Initiating client connection, connectString=localhost:2181 sessionTimeout=20000 watcher=io.strimzi.operator.topic.zk.ZkImpl$$Lambda$12/266121811@453b4655
--
  | [2018-10-25 12:51:25,374] INFO  <ClientCnxn  :1029> [alhost:2181)] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
  | [2018-10-25 12:51:25,375] INFO  <ClientCnxn  :1166> [alhost:2181)] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
  | 2018-10-25T12:51:25.491+0000: [Full GC (Metadata GC Threshold) 2018-10-25T12:51:25.491+0000: [Tenured: 0K->12533K(339968K), 0.0430188 secs] 73608K->12533K(492992K), [Metaspace: 20741K->20741K(1069056K)], 0.0430849 secs] [Times: user=0.04 sys=0.00, real=0.05 secs]
  | [2018-10-25 12:51:26,477] INFO  <ClientCnxn  :1029> [alhost:2181)] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
  | [2018-10-25 12:51:26,478] INFO  <ClientCnxn  :1166> [alhost:2181)] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
  | [2018-10-25 12:51:27,578] INFO  <ClientCnxn  :1029> [alhost:2181)] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
  | [2018-10-25 12:51:27,579] INFO  <ClientCnxn  :1166> [alhost:2181)] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
  | Oct 25, 2018 12:51:27 PM io.vertx.core.impl.BlockedThreadChecker
  | WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2882 ms, time limit is 2000
  | [2018-10-25 12:51:28,679] INFO  <ClientCnxn  :1029> [alhost:2181)] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
  | [2018-10-25 12:51:28,680] INFO  <ClientCnxn  :1166> [alhost:2181)] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
  | Oct 25, 2018 12:51:28 PM io.vertx.core.impl.BlockedThreadChecker
  | WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3882 ms, time limit is 2000
  | [2018-10-25 12:51:29,782] INFO  <ClientCnxn  :1029> [alhost:2181)] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
  | [2018-10-25 12:51:29,782] INFO  <ClientCnxn  :1166> [alhost:2181)] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
  | Oct 25, 2018 12:51:29 PM io.vertx.core.impl.BlockedThreadChecker
  | WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4883 ms, time limit is 2000
  | [2018-10-25 12:51:30,883] INFO  <ClientCnxn  :1029> [alhost:2181)] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
  | [2018-10-25 12:51:30,883] INFO  <ClientCnxn  :1166> [alhost:2181)] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
  | Oct 25, 2018 12:51:30 PM io.vertx.core.impl.BlockedThreadChecker
  | WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5889 ms, time limit is 2000
  | io.vertx.core.VertxException: Thread blocked
  | at sun.misc.Unsafe.park(Native Method)
  | at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  | at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
  | at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
  | at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
  | at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
  | at io.strimzi.operator.topic.zk.ZkImpl.<init>(ZkImpl.java:101)
  | at io.strimzi.operator.topic.zk.Zk.create(Zk.java:22)
  | at io.strimzi.operator.topic.Session.start(Session.java:139)
  | at io.vertx.core.AbstractVerticle.start(AbstractVerticle.java:106)
  | at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$8(DeploymentManager.java:491)
  | at io.vertx.core.impl.DeploymentManager$$Lambda$9/391630194.handle(Unknown Source)
  | at io.vertx.core.impl.ContextImpl.lambda$wrapTask$2(ContextImpl.java:339)
  | at io.vertx.core.impl.ContextImpl$$Lambda$10/989447607.run(Unknown Source)
  | at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
  | at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
  | at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
  | at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
  | at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  | at java.lang.Thread.run(Thread.java:748)
  |  
  | Oct 25, 2018 12:51:31 PM io.vertx.core.impl.BlockedThreadChecker
chris922 commented 6 years ago

My strimzi-topic-operator pod is unavailable and the only error I am able to see in my logs is the same Thread blocked error that is already mentioned in this issue. I am using strimzi 0.8.2

If you need more information please let me know what you need.

scholzj commented 6 years ago

@chris922 Could you please share your log as well?

chris922 commented 6 years ago

Yes for sure, the stacktrace itself and even the threadname is the same.

Unfortunately my TO pod will not start and this exception is the only one that I am able to see in the logs. It will be logged every second.

+ JAR=/topic-operator-0.8.2.jar
+ shift
+ . /bin/dynamic_resources.sh
++ get_heap_size
+++ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
++ CONTAINER_MEMORY_IN_BYTES=100663296
++ DEFAULT_MEMORY_CEILING=1152921504606846975
++ '[' 100663296 -lt 1152921504606846975 ']'
++ '[' -z ']'
++ CONTAINER_HEAP_PERCENT=0.50
++ CONTAINER_MEMORY_IN_MB=96
+++ echo '96 0.50'
+++ awk '{ printf "%d", $1 * $2 }'
++ CONTAINER_HEAP_MAX=48
++ echo 48
+ MAX_HEAP=48
+ '[' -n 48 ']'
+ JAVA_OPTS='-Xms48m -Xmx48m '
+ export MALLOC_ARENA_MAX=2
+ MALLOC_ARENA_MAX=2
+ JAVA_OPTS='-Xms48m -Xmx48m -Dvertx.cacheDirBase=/tmp -Djava.security.egd=file:/dev/./urandom'
+ JAVA_OPTS='-Xms48m -Xmx48m -Dvertx.cacheDirBase=/tmp -Djava.security.egd=file:/dev/./urandom -XX:NativeMemoryTracking=summary -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps'
+ exec java -Xms48m -Xmx48m -Dvertx.cacheDirBase=/tmp -Djava.security.egd=file:/dev/./urandom -XX:NativeMemoryTracking=summary -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -jar /topic-operator-0.8.2.jar -Xms48m -Xmx48m -Dvertx.cacheDirBase=/tmp -Djava.security.egd=file:/dev/./urandom -XX:NativeMemoryTracking=summary -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
2018-11-06T23:04:19.033+0000: [GC (Allocation Failure) 2018-11-06T23:04:19.033+0000: [DefNew: 13184K->1599K(14784K), 0.1014515 secs] 13184K->3620K(47552K), 0.1014956 secs] [Times: user=0.01 sys=0.01, real=0.10 secs]
2018-11-06T23:04:24.436+0000: [GC (Allocation Failure) 2018-11-06T23:04:24.436+0000: [DefNew: 14783K->1600K(14784K), 0.1954543 secs] 16804K->6177K(47552K), 0.1954991 secs] [Times: user=0.01 sys=0.00, real=0.19 secs]
[2018-11-06 23:04:28,035] INFO <Session :58> [main ] Using config:
STRIMZI_TRUSTSTORE_LOCATION:
STRIMZI_RESOURCE_LABELS: strimzi.io/kind=topic
STRIMZI_KAFKA_BOOTSTRAP_SERVERS: homeautomation-kafka-bootstrap:9092
STRIMZI_NAMESPACE: kafka-homeautomation
STRIMZI_ZOOKEEPER_SESSION_TIMEOUT_MS: 20000
STRIMZI_FULL_RECONCILIATION_INTERVAL_MS: 900000
STRIMZI_ZOOKEEPER_CONNECT: homeautomation-zookeeper-client:2181
STRIMZI_TLS_ENABLED: false
STRIMZI_KEYSTORE_PASSWORD:
STRIMZI_TOPIC_METADATA_MAX_ATTEMPTS: 6
STRIMZI_REASSIGN_VERIFY_INTERVAL_MS: 120000
STRIMZI_KEYSTORE_LOCATION:
STRIMZI_TRUSTSTORE_PASSWORD:
STRIMZI_REASSIGN_THROTTLE: 9223372036854775807
[2018-11-06 23:04:28,431] INFO <Session :115> [oop-thread-0] Starting
[2018-11-06 23:04:28,632] INFO <ClientConfig:279> [oop-thread-0] AdminClientConfig values:
bootstrap.servers = [homeautomation-kafka-bootstrap:9092]
client.id =
connections.max.idle.ms = 300000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 120000
retries = 5
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
[2018-11-06 23:04:29,230] INFO <ppInfoParser:109> [oop-thread-0] Kafka version : 2.0.0
[2018-11-06 23:04:29,230] INFO <ppInfoParser:110> [oop-thread-0] Kafka commitId : 3402a8361b734732
2018-11-06T23:04:29.233+0000: [GC (Allocation Failure) 2018-11-06T23:04:29.233+0000: [DefNew: 14776K->1600K(14784K), 0.1049475 secs] 19353K->9763K(47552K), 0.1049911 secs] [Times: user=0.00 sys=0.00, real=0.11 secs]
[2018-11-06 23:04:29,632] INFO <ZooKeeper :100> [oop-thread-0] Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT
[2018-11-06 23:04:29,632] INFO <ZooKeeper :100> [oop-thread-0] Client environment:host.name=strimzi-topic-operator-5d9f9f485d-vlkbk
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.version=1.8.0_191
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.vendor=Oracle Corporation
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.class.path=/topic-operator-0.8.2.jar
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.io.tmpdir=/tmp
[2018-11-06 23:04:29,633] INFO <ZooKeeper :100> [oop-thread-0] Client environment:java.compiler=<NA>
[2018-11-06 23:04:29,634] INFO <ZooKeeper :100> [oop-thread-0] Client environment:os.name=Linux
[2018-11-06 23:04:29,634] INFO <ZooKeeper :100> [oop-thread-0] Client environment:os.arch=amd64
[2018-11-06 23:04:29,634] INFO <ZooKeeper :100> [oop-thread-0] Client environment:os.version=4.4.0-138-generic
[2018-11-06 23:04:29,634] INFO <ZooKeeper :100> [oop-thread-0] Client environment:user.name=root
[2018-11-06 23:04:29,634] INFO <ZooKeeper :100> [oop-thread-0] Client environment:user.home=/root
[2018-11-06 23:04:29,634] INFO <ZooKeeper :100> [oop-thread-0] Client environment:user.dir=/
[2018-11-06 23:04:29,636] INFO <ZooKeeper :442> [oop-thread-0] Initiating client connection, connectString=homeautomation-zookeeper-client:2181 sessionTimeout=20000 watcher=io.strimzi.operator.topic.zk.ZkImpl$$Lambda$12/1697506741@2afd7879
[2018-11-06 23:04:30,033] INFO <ClientCnxn :1029> [client:2181)] Opening socket connection to server homeautomation-zookeeper-client/10.43.157.110:2181. Will not attempt to authenticate using SASL (unknown error)
Nov 06, 2018 11:04:30 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2099 ms, time limit is 2000
Nov 06, 2018 11:04:31 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3099 ms, time limit is 2000
Nov 06, 2018 11:04:32 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4099 ms, time limit is 2000
Nov 06, 2018 11:04:33 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5100 ms, time limit is 2000
io.vertx.core.VertxException: Thread blocked
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at io.strimzi.operator.topic.zk.ZkImpl.<init>(ZkImpl.java:101)
at io.strimzi.operator.topic.zk.Zk.create(Zk.java:22)
at io.strimzi.operator.topic.Session.start(Session.java:139)
at io.vertx.core.AbstractVerticle.start(AbstractVerticle.java:106)
at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$8(DeploymentManager.java:491)
at io.vertx.core.impl.DeploymentManager$$Lambda$9/391630194.handle(Unknown Source)
at io.vertx.core.impl.ContextImpl.lambda$wrapTask$2(ContextImpl.java:339)
at io.vertx.core.impl.ContextImpl$$Lambda$10/989447607.run(Unknown Source)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
scholzj commented 6 years ago

Judging from the Zookeeper address, you installed the standalone version of the Topic Operator (i.e. kubectl apply -f install/topic-operator or something similar).

How did you deployed the Kafka cluster? The standalone version of the Topic Operator is supposed to be used if you have your own version of Kafka. When you use Strimzi to deploy Kafka as well, you should deploy the Topic Operator as part of the cluster. To do that, you have to use the following in the custom resource configuring Kafka:

apiVersion: kafka.strimzi.io/v1alpha1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
  zookeeper:
    # ...
  entityOperator:
    topicOperator: {}
    userOperator: {}

With that, the Topic Operator will be deployed together with Kafka and automatically setup things like TLS authentication between the Topic Operator and Zookeeper and Kafka etc. These things are probably why it doesn't work for you right now.

chris922 commented 6 years ago

I installed Kafka using the cluster operator from strimzi, so it should be exactly the way you described. The cluster operator deployed then the required pods, e. g. kafka and zookeeper. This worked out really well, also the KafkaConnect resource was deployed and is accessible. Everything in the same namespace.

My k8s cluster in general is installed on VMs using Rancher 2.1.1.

In my Kafka cluster configuration I also have this configuration, maybe it's relevant as you mentioned TLS:

listeners:
  plain: {}
  tls: {}

The configuration I used to deploy everything is nearly the same like provided in the strimzi download (in general I just changed names). I used the getting-started guide in the strimzi documentation.

I noticed that I can change the strimzi loglevel, I set it to TRACE. As it might be helpful here is an extract, starting with opening the connection and ending with the exception.

[2018-11-07 13:44:30,910] INFO <ClientCnxn :1029> [client:2181)] Opening socket connection to server homeautomation-zookeeper-client/10.43.157.110:2181. Will not attempt to authenticate using SASL (unknown error)
Nov 07, 2018 1:44:31 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2501 ms, time limit is 2000
[2018-11-07 13:44:31,708] DEBUG <etworkClient:824> [dminclient-1] [AdminClient clientId=adminclient-1] Completed connection to node -1. Fetching API versions.
[2018-11-07 13:44:31,708] DEBUG <etworkClient:838> [dminclient-1] [AdminClient clientId=adminclient-1] Initiating API versions fetch from node -1.
[2018-11-07 13:44:31,709] TRACE <etworkClient:442> [dminclient-1] [AdminClient clientId=adminclient-1] No version information found when sending API_VERSIONS with correlation id 0 to node -1. Assuming version 2.
[2018-11-07 13:44:31,710] TRACE <etworkClient:469> [dminclient-1] [AdminClient clientId=adminclient-1] Sending API_VERSIONS {} with correlation id 0 to node -1
[2018-11-07 13:44:31,807] TRACE <aAdminClient:1117> [dminclient-1] [AdminClient clientId=adminclient-1] KafkaClient#poll retrieved 0 response(s)
[2018-11-07 13:44:31,808] TRACE <aAdminClient:832> [dminclient-1] [AdminClient clientId=adminclient-1] Trying to choose nodes for [] at 1541598271808
[2018-11-07 13:44:31,809] TRACE <aAdminClient:892> [dminclient-1] [AdminClient clientId=adminclient-1] Client is not ready to send to homeautomation-kafka-bootstrap:9092 (id: -1 rack: null). Must delay 9223372036854775807 ms
[2018-11-07 13:44:31,809] TRACE <aAdminClient:1115> [dminclient-1] [AdminClient clientId=adminclient-1] Entering KafkaClient#poll(timeout=118000)
[2018-11-07 13:44:31,811] TRACE <aAdminClient:1117> [dminclient-1] [AdminClient clientId=adminclient-1] KafkaClient#poll retrieved 0 response(s)
[2018-11-07 13:44:31,811] TRACE <aAdminClient:832> [dminclient-1] [AdminClient clientId=adminclient-1] Trying to choose nodes for [] at 1541598271811
[2018-11-07 13:44:31,812] TRACE <aAdminClient:892> [dminclient-1] [AdminClient clientId=adminclient-1] Client is not ready to send to homeautomation-kafka-bootstrap:9092 (id: -1 rack: null). Must delay 9223372036854775807 ms
[2018-11-07 13:44:31,812] TRACE <aAdminClient:1115> [dminclient-1] [AdminClient clientId=adminclient-1] Entering KafkaClient#poll(timeout=117997)
[2018-11-07 13:44:31,906] TRACE <etworkClient:760> [dminclient-1] [AdminClient clientId=adminclient-1] Completed receive from node -1 for API_VERSIONS with correlation id 0, received {error_code=0,api_versions=[{api_key=0,min_version=0,max_version=6},{api_key=1,min_version=0,max_version=8},{api_key=2,min_version=0,max_version=3},{api_key=3,min_version=0,max_version=6},{api_key=4,min_version=0,max_version=1},{api_key=5,min_version=0,max_version=0},{api_key=6,min_version=0,max_version=4},{api_key=7,min_version=0,max_version=1},{api_key=8,min_version=0,max_version=4},{api_key=9,min_version=0,max_version=4},{api_key=10,min_version=0,max_version=2},{api_key=11,min_version=0,max_version=3},{api_key=12,min_version=0,max_version=2},{api_key=13,min_version=0,max_version=2},{api_key=14,min_version=0,max_version=2},{api_key=15,min_version=0,max_version=2},{api_key=16,min_version=0,max_version=2},{api_key=17,min_version=0,max_version=1},{api_key=18,min_version=0,max_version=2},{api_key=19,min_version=0,max_version=3},{api_key=20,min_version=0,max_version=2},{api_key=21,min_version=0,max_version=1},{api_key=22,min_version=0,max_version=1},{api_key=23,min_version=0,max_version=1},{api_key=24,min_version=0,max_version=1},{api_key=25,min_version=0,max_version=1},{api_key=26,min_version=0,max_version=1},{api_key=27,min_version=0,max_version=0},{api_key=28,min_version=0,max_version=1},{api_key=29,min_version=0,max_version=1},{api_key=30,min_version=0,max_version=1},{api_key=31,min_version=0,max_version=1},{api_key=32,min_version=0,max_version=2},{api_key=33,min_version=0,max_version=1},{api_key=34,min_version=0,max_version=1},{api_key=35,min_version=0,max_version=1},{api_key=36,min_version=0,max_version=0},{api_key=37,min_version=0,max_version=1},{api_key=38,min_version=0,max_version=1},{api_key=39,min_version=0,max_version=1},{api_key=40,min_version=0,max_version=1},{api_key=41,min_version=0,max_version=1},{api_key=42,min_version=0,max_version=1}],throttle_time_ms=0}
[2018-11-07 13:44:31,909] DEBUG <etworkClient:792> [dminclient-1] [AdminClient clientId=adminclient-1] Recorded API versions for node -1: (Produce(0): 0 to 6 [usable: 6], Fetch(1): 0 to 8 [usable: 8], ListOffsets(2): 0 to 3 [usable: 3], Metadata(3): 0 to 6 [usable: 6], LeaderAndIsr(4): 0 to 1 [usable: 1], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 4], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 4 [usable: 4], OffsetFetch(9): 0 to 4 [usable: 4], FindCoordinator(10): 0 to 2 [usable: 2], JoinGroup(11): 0 to 3 [usable: 3], Heartbeat(12): 0 to 2 [usable: 2], LeaveGroup(13): 0 to 2 [usable: 2], SyncGroup(14): 0 to 2 [usable: 2], DescribeGroups(15): 0 to 2 [usable: 2], ListGroups(16): 0 to 2 [usable: 2], SaslHandshake(17): 0 to 1 [usable: 1], ApiVersions(18): 0 to 2 [usable: 2], CreateTopics(19): 0 to 3 [usable: 3], DeleteTopics(20): 0 to 2 [usable: 2], DeleteRecords(21): 0 to 1 [usable: 1], InitProducerId(22): 0 to 1 [usable: 1], OffsetForLeaderEpoch(23): 0 to 1 [usable: 1], AddPartitionsToTxn(24): 0 to 1 [usable: 1], AddOffsetsToTxn(25): 0 to 1 [usable: 1], EndTxn(26): 0 to 1 [usable: 1], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 to 1 [usable: 1], DescribeAcls(29): 0 to 1 [usable: 1], CreateAcls(30): 0 to 1 [usable: 1], DeleteAcls(31): 0 to 1 [usable: 1], DescribeConfigs(32): 0 to 2 [usable: 2], AlterConfigs(33): 0 to 1 [usable: 1], AlterReplicaLogDirs(34): 0 to 1 [usable: 1], DescribeLogDirs(35): 0 to 1 [usable: 1], SaslAuthenticate(36): 0 [usable: 0], CreatePartitions(37): 0 to 1 [usable: 1], CreateDelegationToken(38): 0 to 1 [usable: 1], RenewDelegationToken(39): 0 to 1 [usable: 1], ExpireDelegationToken(40): 0 to 1 [usable: 1], DescribeDelegationToken(41): 0 to 1 [usable: 1], DeleteGroups(42): 0 to 1 [usable: 1])
[2018-11-07 13:44:32,006] TRACE <aAdminClient:1117> [dminclient-1] [AdminClient clientId=adminclient-1] KafkaClient#poll retrieved 0 response(s)
[2018-11-07 13:44:32,006] TRACE <aAdminClient:832> [dminclient-1] [AdminClient clientId=adminclient-1] Trying to choose nodes for [] at 1541598272006
[2018-11-07 13:44:32,006] TRACE <aAdminClient:906> [dminclient-1] [AdminClient clientId=adminclient-1] Sending (type=MetadataRequest, topics=) to homeautomation-kafka-bootstrap:9092 (id: -1 rack: null). correlationId=1
[2018-11-07 13:44:32,007] TRACE <etworkClient:469> [dminclient-1] [AdminClient clientId=adminclient-1] Sending METADATA {topics=[],allow_auto_topic_creation=true} with correlation id 1 to node -1
[2018-11-07 13:44:32,007] TRACE <aAdminClient:1115> [dminclient-1] [AdminClient clientId=adminclient-1] Entering KafkaClient#poll(timeout=117802)
[2018-11-07 13:44:32,007] TRACE <aAdminClient:1117> [dminclient-1] [AdminClient clientId=adminclient-1] KafkaClient#poll retrieved 0 response(s)
[2018-11-07 13:44:32,007] TRACE <aAdminClient:832> [dminclient-1] [AdminClient clientId=adminclient-1] Trying to choose nodes for [] at 1541598272007
[2018-11-07 13:44:32,008] TRACE <aAdminClient:1115> [dminclient-1] [AdminClient clientId=adminclient-1] Entering KafkaClient#poll(timeout=117801)
[2018-11-07 13:44:32,008] TRACE <etworkClient:760> [dminclient-1] [AdminClient clientId=adminclient-1] Completed receive from node -1 for METADATA with correlation id 1, received {throttle_time_ms=0,brokers=[{node_id=1,host=homeautomation-kafka-1.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local,port=9092,rack=},{node_id=0,host=homeautomation-kafka-0.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local,port=9092,rack=}],cluster_id=9GVEntk-RIqlLY7OZHyjCw,controller_id=1,topic_metadata=[]}
Nov 07, 2018 1:44:32 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3501 ms, time limit is 2000
[2018-11-07 13:44:32,009] TRACE <aAdminClient:1117> [dminclient-1] [AdminClient clientId=adminclient-1] KafkaClient#poll retrieved 1 response(s)
[2018-11-07 13:44:32,009] DEBUG <adataManager:250> [dminclient-1] [AdminClient clientId=adminclient-1] Updating cluster metadata to Cluster(id = 9GVEntk-RIqlLY7OZHyjCw, nodes = [homeautomation-kafka-1.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local:9092 (id: 1 rack: ), homeautomation-kafka-0.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local:9092 (id: 0 rack: )], partitions = [], controller = homeautomation-kafka-1.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local:9092 (id: 1 rack: ))
[2018-11-07 13:44:32,010] TRACE <aAdminClient:998> [dminclient-1] [AdminClient clientId=adminclient-1] Call(callName=fetchMetadata, deadlineMs=1541598389808) got response {throttle_time_ms=0,brokers=[{node_id=0,host=homeautomation-kafka-0.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local,port=9092,rack=},{node_id=1,host=homeautomation-kafka-1.homeautomation-kafka-brokers.kafka-homeautomation.svc.cluster.local,port=9092,rack=}],cluster_id=9GVEntk-RIqlLY7OZHyjCw,controller_id=1,topic_metadata=[]}
[2018-11-07 13:44:32,010] TRACE <aAdminClient:832> [dminclient-1] [AdminClient clientId=adminclient-1] Trying to choose nodes for [] at 1541598272009
[2018-11-07 13:44:32,010] TRACE <aAdminClient:1115> [dminclient-1] [AdminClient clientId=adminclient-1] Entering KafkaClient#poll(timeout=300000)
Nov 07, 2018 1:44:33 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4501 ms, time limit is 2000
Nov 07, 2018 1:44:34 PM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5501 ms, time limit is 2000
io.vertx.core.VertxException: Thread blocked
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
...
wmorgan6796 commented 6 years ago

I think this is related to issue 1050 as I'm seeing similar issues in my cluster regarding the topic operator