Closed aparajita89 closed 3 years ago
@aparajita89 Could you please let us know the zookeeper-operator and cluster version used here?
pravega zookeeper operator: 0.2.9 zookeeper version: 3.6.2
pravega zookeeper operator: 0.2.9 zookeeper version: 3.6.2
Please use 0.2.10 of operator and cluster. We have fixed it in 0.2.10
ok, will try that. could you also share, what was the fix? just curious
ok, will try that. could you also share, what was the fix? just curious
PR https://github.com/pravega/zookeeper-operator/pull/135 contains the fix.
@anishakj i tried upgrading to 0.2.10 and creating a cluster of 4 nodes. it is still failing with the same error as mentioned in the issue description.
@aparajita89 #135 fixed an issue in the zk docker image itself and not operator. You need to make sure you upgrade to an zk image including that fix
@aparajita89 Please let us know is the issue got solved for you?
i tried upgrading to: pravega/zookeeper-operator: 0.2.10 pravega/zookeeper: 0.2.10
i'm still seeing the same error.
i tried to debug this as well. i think this is related to docker/bin/zookeeperStart.sh script. imo, REGISTER_NODE and WRITE_CONFIGURATION must always be true (consequently, the true/false checks on these can be removed entirely). also, node registration should be called before the config file is written so that the config file will contain the latest information about the cluster. this way, when a new pod is coming up, it always gets the latest configs from the existing zookeeper cluster. but perhaps i am missing something, should these checks be retained?
@aparajita89 coukd you please share the logs from zookeeper-1
before the restart has happened?
these are the last few logs which came from the previous "CrashLoopBackOff" error:
2021-05-24 08:23:13,358 [myid:2] - INFO [main:AbstractConnector@380] - Stopped ServerConnector@70e9c95d{HTTP/1.1,[http/1.1]}{0.0.0.0:7000}
2021-05-24 08:23:13,359 [myid:2] - INFO [main:ContextHandler@1016] - Stopped o.e.j.s.ServletContextHandler@4b520ea8{/,null,UNAVAILABLE}
2021-05-24 08:23:13,361 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 2 not in the peer list
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1073)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2021-05-24 08:23:13,362 [myid:2] - INFO [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.
2021-05-24 08:23:13,364 [myid:2] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1
I've recreated the CRD now to recreate the cluster. After that, these are the last few lines of the logs:
2021-05-24 08:49:23,200 [myid:2] - INFO [main:AbstractConnector@380] - Stopped ServerConnector@70e9c95d{HTTP/1.1,[http/1.1]}{0.0.0.0:7000}
2021-05-24 08:49:23,202 [myid:2] - INFO [main:ContextHandler@1016] - Stopped o.e.j.s.ServletContextHandler@4b520ea8{/,null,UNAVAILABLE}
2021-05-24 08:49:23,203 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 2 not in the peer list
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1073)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2021-05-24 08:49:23,205 [myid:2] - INFO [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.
2021-05-24 08:49:23,207 [myid:2] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1
@ aparajita89 these are not complete logs, also can you tell us which environment you are using.
this is running on a privately managed k8 cluster.
this is the complete log:
$ kubectl logs zookeeperpoc-1
+ source /conf/env.sh
++ DOMAIN=zookeeperpoc-headless.zk.svc.cluster.local
++ QUORUM_PORT=2888
++ LEADER_PORT=3888
++ CLIENT_HOST=zookeeperpoc-client
++ CLIENT_PORT=2181
++ ADMIN_SERVER_HOST=zookeeperpoc-admin-server
++ ADMIN_SERVER_PORT=8080
++ CLUSTER_NAME=zookeeperpoc
++ CLUSTER_SIZE=4
+ source /usr/local/bin/zookeeperFunctions.sh
++ set -ex
++ hostname -s
+ HOST=zookeeperpoc-1
+ DATA_DIR=/data
+ MYID_FILE=/data/myid
+ LOG4J_CONF=/conf/log4j-quiet.properties
+ DYNCONFIG=/data/zoo.cfg.dynamic
+ STATIC_CONFIG=/data/conf/zoo.cfg
+ [[ zookeeperpoc-1 =~ (.*)-([0-9]+)$ ]]
+ NAME=zookeeperpoc
+ ORD=1
+ MYID=2
+ WRITE_CONFIGURATION=true
+ REGISTER_NODE=true
+ ONDISK_MYID_CONFIG=false
+ ONDISK_DYN_CONFIG=false
+ '[' -f /data/myid ']'
++ cat /data/myid
+ EXISTING_ID=2
+ [[ 2 == \2 ]]
+ [[ -f /data/conf/zoo.cfg ]]
+ ONDISK_MYID_CONFIG=true
+ '[' -f /data/zoo.cfg.dynamic ']'
+ ONDISK_DYN_CONFIG=true
+ set +e
+ [[ -n '' ]]
+ set -e
+ set +e
+ nslookup zookeeperpoc-headless.zk.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find zookeeperpoc-headless.zk.svc.cluster.local: NXDOMAIN
+ [[ 1 -eq 0 ]]
+ grep -q 'server can'\''t find zookeeperpoc-headless.zk.svc.cluster.local'
+ nslookup zookeeperpoc-headless.zk.svc.cluster.local
+ echo 'there is no active ensemble'
+ ACTIVE_ENSEMBLE=false
+ [[ true == true ]]
+ [[ true == true ]]
there is no active ensemble
Copying /conf contents to writable directory, to support Zookeeper dynamic reconfiguration
+ WRITE_CONFIGURATION=false
+ [[ false == false ]]
+ REGISTER_NODE=false
+ [[ false == true ]]
+ [[ false == true ]]
+ ZOOCFGDIR=/data/conf
+ export ZOOCFGDIR
+ echo Copying /conf contents to writable directory, to support Zookeeper dynamic reconfiguration
+ [[ ! -d /data/conf ]]
+ echo Copying the /conf/zoo.cfg contents except the dynamic config file during restart
Copying the /conf/zoo.cfg contents except the dynamic config file during restart
++ head -n -1 /conf/zoo.cfg
++ tail -n 1 /data/conf/zoo.cfg
+ echo -e '4lw.commands.whitelist=cons, envi, conf, crst, srvr, stat, mntr, ruok
dataDir=/data
standaloneEnabled=false
reconfigEnabled=true
skipACL=yes
metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
metricsProvider.httpPort=7000
metricsProvider.exportJvmInfo=true
initLimit=15
syncLimit=3
tickTime=1500
globalOutstandingLimit=1000
preAllocSize=65536
snapCount=10000
commitLogCount=500
snapSizeLimitInKb=4194304
maxCnxns=0
maxClientCnxns=60
minSessionTimeout=3000
maxSessionTimeout=30000
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
quorumListenOnAllIPs=false
admin.serverPort=8080\ndynamicConfigFile=/data/zoo.cfg.dynamic'
+ cp -f /conf/log4j.properties /data/conf
+ cp -f /conf/log4j-quiet.properties /data/conf
+ cp -f /conf/env.sh /data/conf
Starting zookeeper service
+ '[' -f /data/zoo.cfg.dynamic ']'
+ echo Starting zookeeper service
+ zkServer.sh --config /data/conf start-foreground
ZooKeeper JMX enabled by default
Using config: /data/conf/zoo.cfg
2021-05-24 08:49:22,690 [myid:] - INFO [main:QuorumPeerConfig@173] - Reading configuration from: /data/conf/zoo.cfg
2021-05-24 08:49:22,698 [myid:] - INFO [main:QuorumPeerConfig@450] - clientPort is not set
2021-05-24 08:49:22,698 [myid:] - INFO [main:QuorumPeerConfig@463] - secureClientPort is not set
2021-05-24 08:49:22,698 [myid:] - INFO [main:QuorumPeerConfig@479] - observerMasterPort is not set
2021-05-24 08:49:22,702 [myid:] - INFO [main:QuorumPeerConfig@496] - metricsProvider.className is org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
2021-05-24 08:49:22,718 [myid:] - WARN [main:QuorumPeerConfig@727] - No server failure will be tolerated. You need at least 3 servers.
2021-05-24 08:49:22,722 [myid:2] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2021-05-24 08:49:22,722 [myid:2] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1
2021-05-24 08:49:22,726 [myid:2] - INFO [main:ManagedUtil@44] - Log4j 1.2 jmx support found and enabled.
2021-05-24 08:49:22,731 [myid:2] - INFO [main:QuorumPeerMain@151] - Starting quorum peer
2021-05-24 08:49:22,759 [myid:2] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@139] - Purge task started.
2021-05-24 08:49:22,772 [myid:2] - INFO [PurgeTask:FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2021-05-24 08:49:22,783 [myid:2] - INFO [main:PrometheusMetricsProvider@74] - Initializing metrics, configuration: {exportJvmInfo=true, httpPort=7000}
2021-05-24 08:49:22,783 [myid:2] - INFO [main:PrometheusMetricsProvider@82] - Starting /metrics HTTP endpoint at port 7000 exportJvmInfo: true
2021-05-24 08:49:22,797 [myid:2] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@145] - Purge task completed.
2021-05-24 08:49:22,867 [myid:2] - INFO [main:Log@169] - Logging initialized @889ms to org.eclipse.jetty.util.log.Slf4jLog
2021-05-24 08:49:23,020 [myid:2] - INFO [main:Server@359] - jetty-9.4.24.v20191120; built: 2019-11-20T21:37:49.771Z; git: 363d5f2df3a8a28de40604320230664b9c793c16; jvm 11.0.8+10
2021-05-24 08:49:23,076 [myid:2] - INFO [main:ContextHandler@825] - Started o.e.j.s.ServletContextHandler@4b520ea8{/,null,AVAILABLE}
2021-05-24 08:49:23,104 [myid:2] - INFO [main:AbstractConnector@330] - Started ServerConnector@70e9c95d{HTTP/1.1,[http/1.1]}{0.0.0.0:7000}
2021-05-24 08:49:23,104 [myid:2] - INFO [main:Server@399] - Started @1130ms
2021-05-24 08:49:23,120 [myid:2] - INFO [main:ServerMetrics@62] - ServerMetrics initialized with provider org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider@7ac296f6
2021-05-24 08:49:23,142 [myid:2] - INFO [main:QuorumPeer@752] - zookeeper.quorumCnxnTimeoutMs=-1
2021-05-24 08:49:23,156 [myid:2] - WARN [main:ContextHandler@1520] - o.e.j.s.ServletContextHandler@79c97cb{/,null,UNAVAILABLE} contextPath ends with /*
2021-05-24 08:49:23,156 [myid:2] - WARN [main:ContextHandler@1531] - Empty contextPath
2021-05-24 08:49:23,158 [myid:2] - INFO [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-05-24 08:49:23,159 [myid:2] - INFO [main:FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2021-05-24 08:49:23,159 [myid:2] - INFO [main:QuorumPeer@1680] - Local sessions disabled
2021-05-24 08:49:23,159 [myid:2] - INFO [main:QuorumPeer@1691] - Local session upgrading disabled
2021-05-24 08:49:23,159 [myid:2] - INFO [main:QuorumPeer@1658] - tickTime set to 1500
2021-05-24 08:49:23,159 [myid:2] - INFO [main:QuorumPeer@1702] - minSessionTimeout set to 3000
2021-05-24 08:49:23,160 [myid:2] - INFO [main:QuorumPeer@1713] - maxSessionTimeout set to 30000
2021-05-24 08:49:23,160 [myid:2] - INFO [main:QuorumPeer@1738] - initLimit set to 15
2021-05-24 08:49:23,160 [myid:2] - INFO [main:QuorumPeer@1920] - syncLimit set to 3
2021-05-24 08:49:23,160 [myid:2] - INFO [main:QuorumPeer@1935] - connectToLearnerMasterLimit set to 0
2021-05-24 08:49:23,169 [myid:2] - INFO [main:ZookeeperBanner@42] -
2021-05-24 08:49:23,169 [myid:2] - INFO [main:ZookeeperBanner@42] - ______ _
2021-05-24 08:49:23,169 [myid:2] - INFO [main:ZookeeperBanner@42] - |___ / | |
2021-05-24 08:49:23,170 [myid:2] - INFO [main:ZookeeperBanner@42] - / / ___ ___ | | __ ___ ___ _ __ ___ _ __
2021-05-24 08:49:23,170 [myid:2] - INFO [main:ZookeeperBanner@42] - / / / _ \ / _ \ | |/ / / _ \ / _ \ | '_ \ / _ \ | '__|
2021-05-24 08:49:23,171 [myid:2] - INFO [main:ZookeeperBanner@42] - / /__ | (_) | | (_) | | < | __/ | __/ | |_) | | __/ | |
2021-05-24 08:49:23,171 [myid:2] - INFO [main:ZookeeperBanner@42] - /_____| \___/ \___/ |_|\_\ \___| \___| | .__/ \___| |_|
2021-05-24 08:49:23,171 [myid:2] - INFO [main:ZookeeperBanner@42] - | |
2021-05-24 08:49:23,171 [myid:2] - INFO [main:ZookeeperBanner@42] - |_|
2021-05-24 08:49:23,171 [myid:2] - INFO [main:ZookeeperBanner@42] -
2021-05-24 08:49:23,172 [myid:2] - INFO [main:Environment@98] - Server environment:zookeeper.version=3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 04/21/2020 15:01 GMT
2021-05-24 08:49:23,172 [myid:2] - INFO [main:Environment@98] - Server environment:host.name=zookeeperpoc-1.zookeeperpoc-headless.zk.svc.cluster.local
2021-05-24 08:49:23,172 [myid:2] - INFO [main:Environment@98] - Server environment:java.version=11.0.8
2021-05-24 08:49:23,172 [myid:2] - INFO [main:Environment@98] - Server environment:java.vendor=N/A
2021-05-24 08:49:23,172 [myid:2] - INFO [main:Environment@98] - Server environment:java.home=/usr/local/openjdk-11
2021-05-24 08:49:23,172 [myid:2] - INFO [main:Environment@98] - Server environment:java.class.path=/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.6.1-bin/bin/../build/classes:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-prometheus-metrics-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-jute-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/snappy-java-1.1.7.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-log4j12-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-api-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_servlet-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_hotspot-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_common-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-unix-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-epoll-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-resolver-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-handler-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-codec-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-buffer-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/metrics-core-3.2.5.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/json-simple-1.1.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jline-2.11.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-util-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-servlet-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-server-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-security-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-io-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-http-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-databind-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-core-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-annotations-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-lang-2.6.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-cli-1.2.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/audience-annotations-0.5.0.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/data/conf:
2021-05-24 08:49:23,173 [myid:2] - INFO [main:Environment@98] - Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-05-24 08:49:23,173 [myid:2] - INFO [main:Environment@98] - Server environment:java.io.tmpdir=/tmp
2021-05-24 08:49:23,173 [myid:2] - INFO [main:Environment@98] - Server environment:java.compiler=<NA>
2021-05-24 08:49:23,173 [myid:2] - INFO [main:Environment@98] - Server environment:os.name=Linux
2021-05-24 08:49:23,173 [myid:2] - INFO [main:Environment@98] - Server environment:os.arch=amd64
2021-05-24 08:49:23,173 [myid:2] - INFO [main:Environment@98] - Server environment:os.version=5.10.0-0.bpo.3-cloud-amd64
2021-05-24 08:49:23,174 [myid:2] - INFO [main:Environment@98] - Server environment:user.name=root
2021-05-24 08:49:23,174 [myid:2] - INFO [main:Environment@98] - Server environment:user.home=/root
2021-05-24 08:49:23,174 [myid:2] - INFO [main:Environment@98] - Server environment:user.dir=/apache-zookeeper-3.6.1-bin
2021-05-24 08:49:23,176 [myid:2] - INFO [main:Environment@98] - Server environment:os.memory.free=881MB
2021-05-24 08:49:23,176 [myid:2] - INFO [main:Environment@98] - Server environment:os.memory.max=966MB
2021-05-24 08:49:23,176 [myid:2] - INFO [main:Environment@98] - Server environment:os.memory.total=966MB
2021-05-24 08:49:23,176 [myid:2] - INFO [main:ZooKeeperServer@128] - zookeeper.enableEagerACLCheck = false
2021-05-24 08:49:23,176 [myid:2] - INFO [main:ZooKeeperServer@132] - zookeeper.skipACL=="yes", ACL checks will be skipped
2021-05-24 08:49:23,177 [myid:2] - INFO [main:ZooKeeperServer@136] - zookeeper.digest.enabled = true
2021-05-24 08:49:23,177 [myid:2] - INFO [main:ZooKeeperServer@140] - zookeeper.closeSessionTxn.enabled = true
2021-05-24 08:49:23,177 [myid:2] - INFO [main:ZooKeeperServer@1434] - zookeeper.flushDelay=0
2021-05-24 08:49:23,177 [myid:2] - INFO [main:ZooKeeperServer@1443] - zookeeper.maxWriteQueuePollTime=0
2021-05-24 08:49:23,177 [myid:2] - INFO [main:ZooKeeperServer@1452] - zookeeper.maxBatchSize=1000
2021-05-24 08:49:23,177 [myid:2] - INFO [main:ZooKeeperServer@241] - zookeeper.intBufferStartingSizeBytes = 1024
2021-05-24 08:49:23,180 [myid:2] - INFO [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2021-05-24 08:49:23,180 [myid:2] - INFO [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2021-05-24 08:49:23,182 [myid:2] - INFO [main:ZKDatabase@132] - zookeeper.snapshotSizeFactor = 0.33
2021-05-24 08:49:23,182 [myid:2] - INFO [main:ZKDatabase@152] - zookeeper.commitLogCount=500
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@2001] - Using insecure (non-TLS) quorum communication
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@2007] - Port unification disabled
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@174] - multiAddress.enabled set to false
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@199] - multiAddress.reachabilityCheckEnabled set to true
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@186] - multiAddress.reachabilityCheckTimeoutMs set to 1000
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@2461] - QuorumPeer communication is not secured! (SASL auth disabled)
2021-05-24 08:49:23,196 [myid:2] - INFO [main:QuorumPeer@2486] - quorum.cnxn.threads.size set to 20
2021-05-24 08:49:23,200 [myid:2] - INFO [main:AbstractConnector@380] - Stopped ServerConnector@70e9c95d{HTTP/1.1,[http/1.1]}{0.0.0.0:7000}
2021-05-24 08:49:23,202 [myid:2] - INFO [main:ContextHandler@1016] - Stopped o.e.j.s.ServletContextHandler@4b520ea8{/,null,UNAVAILABLE}
2021-05-24 08:49:23,203 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 2 not in the peer list
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1073)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2021-05-24 08:49:23,205 [myid:2] - INFO [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.
2021-05-24 08:49:23,207 [myid:2] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1
@aparajita89 it looks like /data/zoo.cfg.dynamic
file is present since you have tried an upgrade. could you please uninstall zookeeper cluster and do an installation. Also let me know if nslookup zookeeperpoc-headless.zk.svc.cluster.local
is resolving from the first zk pod?
nslookup zookeeperpoc-headless.zk.svc.cluster.local => this is resolving to zookeeperpoc-0 which is the first pod in the cluster
i deleted the CRD again. seems like this does not delete the PVC. i manually deleted the PVC as well and then re-created the CRD. this time the cluster got created after a couple of pod restarts on zookeeperpoc-1.
i tried recreating the cluster again and this time the PVC got deleted and recreated as expected. we can close this issue now. thanks for you help @anishakj .
Closing this issue, as it is resolved
Description
When a CRD is created for the cluster to be brought up with spec.replicas set to n, where n > 1, pods with id > 0 go into "CrashLoopBackOff" state with the below error in zookeeper logs:
Importance
Bringing up a cluster on n nodes seems like a basic feature for an operator. Perhaps I am missing something in the configs?
Location
deploy/cr/pravega/zookeeper_v1beta1_zookeepercluster_cr.yaml
Suggestions for an improvement