pravega / zookeeper-operator

Kubernetes Operator for Zookeeper
Apache License 2.0
367 stars 201 forks source link

Couldn't bind to zookeeper-01-0.zookeeper-01-headless.default.svc.cluster.local:2888 #251

Closed idealemail closed 4 years ago

idealemail commented 4 years ago

2020-10-05 16:43:47,091 [myid:1] - ERROR [QuorumPeermyid=1(secure=disabled):Leader@318] - Couldn't bind to zookeeper-01-0.zookeeper-01-headless.default.svc.cluster.local:2888 java.net.SocketException: Unresolved address at java.base/java.net.ServerSocket.bind(Unknown Source) at java.base/java.net.ServerSocket.bind(Unknown Source) at org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:315) at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:294) at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source) at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source) at org.apache.zookeeper.server.quorum.Leader.(Leader.java:297) at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1260) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1467) 2020-10-05 16:43:47,091 [myid:1] - WARN [QuorumPeermyid=1(secure=disabled):QuorumPeer@1471] - Unexpected exception java.io.IOException: Leader failed to initialize any of the following sockets: [zookeeper-01-0.zookeeper-01-headless.default.svc.cluster.local:2888] at org.apache.zookeeper.server.quorum.Leader.(Leader.java:300) at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1260) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1467)

idealemail commented 4 years ago

dig zookeeper-01.default.svc.cluster.local @172.20.0.2

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> zookeeper-01.default.svc.cluster.local @172.20.0.2 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 45543 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;zookeeper-01.default.svc.cluster.local. IN A

;; AUTHORITY SECTION: cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1601916025 7200 1800 86400 5

;; Query time: 0 msec ;; SERVER: 172.20.0.2#53(172.20.0.2) ;; WHEN: Tue Oct 06 00:49:29 CST 2020 ;; MSG SIZE rcvd: 160

anishakj commented 4 years ago

Could you please elaborate in what scenario you are getting this error

idealemail commented 4 years ago

os:Linux XXGL-T-TJSYZ-arch-mid-redis-test-037 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux docker:Docker version 19.03.5, build 633a0ea838 k8s version:Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 ~]# kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE zookeeper-operator 1/1 1 1 19h

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 ~]# kubectl get sc NAME PROVISIONER AGE ceph-rbd kubernetes.io/rbd 23h

zookeeper-operator: the latest version on 5 oct 2020

Install the operator follow the Manual deployment method

sample.yaml

apiVersion: "zookeeper.pravega.io/v1beta1" kind: "ZookeeperCluster" metadata: name: "zookeeper-03" spec: containers:

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 deploy]# kubectl apply -f sample.yaml zookeepercluster.zookeeper.pravega.io/zookeeper-03 created

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 deploy]# kubectl get pods NAME READY STATUS RESTARTS AGE prometheus-operator-7bdfb4676b-cl989 1/1 Running 0 23h zookeeper-03-0 0/1 Error 0 15s zookeeper-operator-86b8dbd87c-h8pm8 1/1 Running 0 19h

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 deploy]# kubectl describe pods zookeeper-03-0 Name: zookeeper-03-0 Namespace: default Priority: 0 Node: 172.24.29.212/172.24.29.212 Start Time: Tue, 06 Oct 2020 19:00:15 +0800 Labels: app=zookeeper-03 controller-revision-hash=zookeeper-03-bcc9c887d kind=ZookeeperMember statefulset.kubernetes.io/pod-name=zookeeper-03-0 Annotations: Status: Running IP: 172.20.0.70 IPs: IP: 172.20.0.70 Controlled By: StatefulSet/zookeeper-03 Containers: zookeeper: Container ID: docker://7afeccdad1f581e4c8d2620d42068c58ea6d9618a7742d565f58d857811caa68 Image: pravega/zookeeper:0.2.8 Image ID: docker-pullable://pravega/zookeeper@sha256:7c082d18d48b38a20cf4c19e0031d3a2603a2595a9af5a4413aac7af0225b3d5 Ports: 2181/TCP, 2888/TCP, 3888/TCP, 7000/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP Command: /usr/local/bin/zookeeperStart.sh State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 14 Started: Tue, 06 Oct 2020 19:00:35 +0800 Finished: Tue, 06 Oct 2020 19:00:40 +0800 Ready: False Restart Count: 1 Liveness: exec [zookeeperLive.sh] delay=10s timeout=10s period=10s #success=1 #failure=3 Readiness: exec [zookeeperReady.sh] delay=10s timeout=10s period=10s #success=1 #failure=3 Environment: ENVOY_SIDECAR_STATUS: (v1:metadata.annotations['sidecar.istio.io/status']) Mounts: /conf from conf (rw) /data from data (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-qbsdd (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-zookeeper-03-0 ReadOnly: false conf: Type: ConfigMap (a volume populated by a ConfigMap) Name: zookeeper-03-configmap Optional: false default-token-qbsdd: Type: Secret (a volume populated by a Secret) SecretName: default-token-qbsdd Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal Scheduled 33s default-scheduler Successfully assigned default/zookeeper-03-0 to 172.24.29.212 Normal SuccessfulAttachVolume 33s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-0228c251-7482-4180-b087-8ae4fb04af30" Normal Pulling 21s (x2 over 29s) kubelet, 172.24.29.212 Pulling image "pravega/zookeeper:0.2.8" Normal Pulled 13s (x2 over 26s) kubelet, 172.24.29.212 Successfully pulled image "pravega/zookeeper:0.2.8" Normal Created 13s (x2 over 26s) kubelet, 172.24.29.212 Created container zookeeper Normal Started 13s (x2 over 26s) kubelet, 172.24.29.212 Started container zookeeper Warning BackOff 7s (x2 over 8s) kubelet, 172.24.29.212 Back-off restarting failed container

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 deploy]# kubectl get svc -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cert-manager cert-manager ClusterIP 10.68.21.205 9402/TCP 23h cert-manager cert-manager-webhook ClusterIP 10.68.242.17 443/TCP 23h default kubernetes ClusterIP 10.68.0.1 443/TCP 23h default prometheus-operator ClusterIP None 8080/TCP 23h default zookeeper-03-client ClusterIP 10.68.30.3 2181/TCP 102s default zookeeper-03-headless ClusterIP None 2181/TCP,2888/TCP,3888/TCP,7000/TCP 102s kafka kafka-cruisecontrol-svc ClusterIP 10.68.57.0 8090/TCP,9020/TCP 21h kafka kafka-headless ClusterIP None 29092/TCP,29093/TCP,9020/TCP 21h kafka kafka-operator-certman-proxy-alertmanager ClusterIP 10.68.239.165 9001/TCP 21h kafka kafka-operator-certman-proxy-controller-manager-metrics-service ClusterIP 10.68.60.79 8443/TCP 21h kafka kafka-operator-certman-proxy-webhook-service ClusterIP 10.68.183.3 443/TCP 21h kube-system dashboard-metrics-scraper ClusterIP 10.68.138.2 8000/TCP 23h kube-system kube-dns ClusterIP 10.68.0.2 53/UDP,53/TCP,9153/TCP 23h kube-system kubelet ClusterIP None 10250/TCP,10255/TCP,4194/TCP 23h kube-system kubernetes-dashboard NodePort 10.68.149.222 443:30483/TCP 23h kube-system metrics-server ClusterIP 10.68.237.32 443/TCP 23h kube-system tiller-deploy ClusterIP 10.68.144.151 44134/TCP 20h

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 deploy]# dig zookeeper-03-client.default.svc.cluster.local @172.20.0.2

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> zookeeper-03-client.default.svc.cluster.local @172.20.0.2 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35225 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;zookeeper-03-client.default.svc.cluster.local. IN A

;; ANSWER SECTION: zookeeper-03-client.default.svc.cluster.local. 5 IN A 10.68.30.3

;; Query time: 0 msec ;; SERVER: 172.20.0.2#53(172.20.0.2) ;; WHEN: Tue Oct 06 19:03:09 CST 2020 ;; MSG SIZE rcvd: 135

[root@XXGL-T-TJSYZ-arch-mid-redis-test-037 deploy]# dig zookeeper-03-headless.default.svc.cluster.local @172.20.0.2

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> zookeeper-03-headless.default.svc.cluster.local @172.20.0.2 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 890 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;zookeeper-03-headless.default.svc.cluster.local. IN A

;; AUTHORITY SECTION: cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1601982022 7200 1800 86400 5

;; Query time: 0 msec ;; SERVER: 172.20.0.2#53(172.20.0.2) ;; WHEN: Tue Oct 06 19:03:48 CST 2020 ;; MSG SIZE rcvd: 169

pod log

** server can't find zookeeper-03-headless.default.svc.cluster.local: NXDOMAIN

idealemail commented 4 years ago

https://github.com/confluentinc/cp-helm-charts/issues/205

maybe should bind 0.0.0.0 resolve the error

WangZ0635 commented 2 years ago

apiVersion: "zookeeper.pravega.io/v1beta1" kind: "ZookeeperCluster" metadata: name: example spec: replicas: 3 storageType: ephemeral config: quorumListenOnAllIPs: true You can configure quorumlistenonallips in the config attribute When set to true, the zookeeper server will listen for connection requests from its peers on all available IP addresses, not just the addresses configured in the server list of the configuration file. It will affect the connection between Zab protocol and fast leader election protocol. The default value is false.