strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.78k stars 1.28k forks source link

Kindly Advice - Kafka/Zookeeper communication issue #3256

Closed pashafirdous closed 4 years ago

pashafirdous commented 4 years ago

Hello All,

Please advice.

we have observed the below logs(Issue) where in the zookeeper is not running

please advice what is the root cause.

Zookeeper Logs:

2020-06-24 08:10:54,497 INFO Processing ruok command from /127.0.0.1:47298 (org.apache.zookeeper.server.NIOServerCnxn) [NIOWorkerThread-1] 2020-06-24 08:10:54,497 DEBUG Closed socket connection for client /127.0.0.1:47298 (no session established for client) (org.apache.zookeeper.server.NIOServerCnxn) [NIOWorkerThread-1] 2020-06-24 08:10:55,204 DEBUG Accepted socket connection from /127.0.0.1:47308 (org.apache.zookeeper.server.NIOServerCnxnFactory) [NIOServerCxnFactory.AcceptThread:/127.0.0.1:21810] 2020-06-24 06:46:47,189 WARN Exception causing close of session 0x0: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn) [NIOWorkerThread-2] 2020-06-24 06:46:47,189 DEBUG IOException stack trace (org.apache.zookeeper.server.NIOServerCnxn) [NIOWorkerThread-2] java.io.IOException: ZooKeeperServer not running at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:556) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332) at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:530) at org.apache.zookeeper.server.Worker^C

Kafka Logs:

2020-06-24 09:18:12,611 INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)] 2020-06-24 09:18:12,612 INFO Socket connection established, initiating session, client: /127.0.0.1:54222, server: localhost/127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)] 2020-06-24 09:18:12,617 WARN Session 0x100234664220001 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:75) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)

We have connected to one of the Kafka pods to execute the ACL shell scripts which will talk to zookeeper for fetching the user list, but we have the timeout error.

./bin/kafka-acls.sh --authorizer-properties zookeeper.connect=127.0.0.1:2181 --list OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N [2020-06-24 11:28:47,826] WARN Client session timed out, have not heard from server in 6000ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn) Error while executing ACL command: Timed out waiting for connection while in state: CONNECTING kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255) at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:113) at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858) at kafka.security.authorizer.AclAuthorizer.configure(AclAuthorizer.scala:127) at kafka.security.auth.SimpleAclAuthorizer.configure(SimpleAclAuthorizer.scala:86) at kafka.admin.AclCommand$AuthorizerService.withAuthorizer(AclCommand.scala:208) at kafka.admin.AclCommand$AuthorizerService.listAcls(AclCommand.scala:245) at kafka.admin.AclCommand$.main(AclCommand.scala:82) at kafka.admin.AclCommand.main(AclCommand.scala)

Currently, you can reproduce the issue on Kafka-mirror namespace.

Commands used: kubectl logs -f mm-backup-cluster-zookeeper-0 -n kafka-mirror zookeeper.

Services: service/mm-backup-cluster-kafka-0 service/mm-backup-cluster-kafka-1 service/mm-backup-cluster-kafka-2 service/mm-backup-cluster-zookeeper-0 service/mm-backup-cluster-zookeeper-1 service/mm-backup-cluster-zookeeper-2

scholzj commented 4 years ago

That looks like your Zookeeper is somehow broken. But no way to tell the reason from these logs I think. Do you have anything more in the Zookeeper logs or the TLS sidecar logs?

pashafirdous commented 4 years ago

thanks for the assistance this is addressed. there was some issue with GKE firewall restrictions. the interpod communcation was affected.