uber / uReplicator

Improvement of Apache Kafka Mirrormaker
Apache License 2.0
913 stars 199 forks source link

Invalid cluster setup, missing znode path, Cluster structure is not set up for cluster #293

Open dungnt081191 opened 4 years ago

dungnt081191 commented 4 years ago

HI everyone, I'm using code from commit hash 70ac340 master branch. Now i setting

1 Controller & 3 Worker is good now, but the last Worker through this exception : Anyone explain clear for me?. @Technoboy- @yangy0000 do you know this ?

[2019-11-05 10:41:40,518] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient:936)
[2019-11-05 10:41:40,518] INFO Handling new session, session id: 1062cda1558000f, instance: HelixMirrorMaker-1572950499885, instanceTye: PARTICIPANT, cluster: uReplicatorTEST, zkconnection: State:CONNECTED Timeout:30000 sessionid:0x1062cda1558000f local:/172.16.129.24:52794 remoteserver:10.100.3.101/10.100.3.101:2181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 (org.apache.helix.manager.zk.ZKHelixManager:748)
[2019-11-05 10:41:40,518] WARN ParticipantHealthReportTimerTask already stopped (org.apache.helix.healthcheck.ParticipantHealthReportTask:67)
[2019-11-05 10:41:40,644] INFO Invalid cluster setup, missing znode path: /uReplicatorTEST/CONTROLLER
Invalid cluster setup, missing znode path: /uReplicatorTEST/CONTROLLER/MESSAGES
Invalid cluster setup, missing znode path: /uReplicatorTEST/CONTROLLER/ERRORS
Invalid cluster setup, missing znode path: /uReplicatorTEST/CONTROLLER/STATUSUPDATES
Invalid cluster setup, missing znode path: /uReplicatorTEST/CONTROLLER/HISTORY
 (org.apache.helix.manager.zk.ZKUtil:88)
2019-11-05T10:41:40.653+0000: Total time for which application threads were stopped: 0.0015168 seconds, Stopping threads took: 0.0000388 seconds
[2019-11-05 10:41:40,645] ERROR fail to createClient. (org.apache.helix.manager.zk.ZKHelixManager:496)
org.apache.helix.HelixException: Cluster structure is not set up for cluster: uReplicatorTEST
    at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:861)
    at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)
    at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:531)
    at kafka.mirrormaker.WorkerInstance.addToHelixController(WorkerInstance.scala:340)
    at kafka.mirrormaker.WorkerInstance.start(WorkerInstance.scala:250)
    at kafka.mirrormaker.MirrorMakerWorker.main(MirrorMakerWorker.scala:109)
    at com.uber.stream.kafka.mirrormaker.starter.MirrorMakerStarter.main(MirrorMakerStarter.java:44)
[2019-11-05 10:41:40,661] ERROR fail to connect HelixMirrorMaker-1572950499885 (org.apache.helix.manager.zk.ZKHelixManager:534)
org.apache.helix.HelixException: Cluster structure is not set up for cluster: uReplicatorTEST
    at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:861)
    at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)
    at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:531)
    at kafka.mirrormaker.WorkerInstance.addToHelixController(WorkerInstance.scala:340)
    at kafka.mirrormaker.WorkerInstance.start(WorkerInstance.scala:250)
    at kafka.mirrormaker.MirrorMakerWorker.main(MirrorMakerWorker.scala:109)
    at com.uber.stream.kafka.mirrormaker.starter.MirrorMakerStarter.main(MirrorMakerStarter.java:44)
[2019-11-05 10:41:40,687] INFO Is not shutting down; call cleanShutdown() (kafka.mirrormaker.WorkerInstance:66)
[2019-11-05 10:41:40,688] INFO Start clean shutdown. (kafka.mirrormaker.WorkerInstance:66)
[2019-11-05 10:41:40,692] INFO Flushing last batches and commit offsets. (kafka.mirrormaker.WorkerInstance:66)
[2019-11-05 10:41:40,692] INFO Flushing producer. (kafka.mirrormaker.WorkerInstance:66)
Exception in thread "main" java.lang.NullPointerException
    at kafka.mirrormaker.WorkerInstance.maybeFlushAndCommitOffsets(WorkerInstance.scala:346)
    at kafka.mirrormaker.WorkerInstance.cleanShutdown(WorkerInstance.scala:385)
    at kafka.mirrormaker.WorkerInstance$WorkerZKHelixManager.disconnect(WorkerInstance.scala:328)
    at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:535)
    at kafka.mirrormaker.WorkerInstance.addToHelixController(WorkerInstance.scala:340)
    at kafka.mirrormaker.WorkerInstance.start(WorkerInstance.scala:250)
    at kafka.mirrormaker.MirrorMakerWorker.main(MirrorMakerWorker.scala:109)
    at com.uber.stream.kafka.mirrormaker.starter.MirrorMakerStarter.main(MirrorMakerStarter.java:44)
2019-11-05T10:41:41.654+0000: Total time for which application threads were stopped: 0.0003464 seconds, Stopping threads took: 0.0000637 seconds
2019-11-05T10:41:42.320+0000: Total time for which application threads were stopped: 0.0006129 seconds, Stopping threads took: 0.0000342 seconds
2019-11-05T10:41:43.325+0000: Total time for which application threads were stopped: 0.0001109 seconds, Stopping threads took: 0.0000327 seconds
Technoboy- commented 4 years ago

restart the last worker can resolve the problem. the best order to start cluster : start controllers first, then workers。the controllers have the ability to setup the cluster info

Technoboy- commented 4 years ago

and using worker-3.0 is recommended

dungnt081191 commented 4 years ago

HI @Technoboy- , Worker through another exception about :

shed-0) (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:72)
kafka.common.BrokerEndPointNotAvailableException: End point with security protocol PLAINTEXT not found for broker 1
    at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1$$anonfun$apply$5.apply(ClientUtils.scala:149)
    at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1$$anonfun$apply$5.apply(ClientUtils.scala:149)
    at scala.Option.getOrElse(Option.scala:121)
    at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1.apply(ClientUtils.scala:149)
    at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1.apply(ClientUtils.scala:145)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at kafka.client.ClientUtils$.getPlaintextBrokerEndPoints(ClientUtils.scala:145)
    at kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread.doWork(CompactConsumerFetcherManager.scala:345)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2019-11-06 09:22:09,000] INFO [CompactConsumerFetcherManager-1573031922798] Added fetcher for partitions ArrayBuffer() (kafka.mirrormaker.CompactConsumerFetcherManager:66)

And it's using Worker-3.0. Another thing is : Kafka use version 1.1, how can we upgrade Kafka to latest version from Apache Kafka

Technoboy- commented 4 years ago

did you config the SSL part in consumer/producer? reference to : http://kafka.apache.org/documentation.html#security_configclients

dungnt081191 commented 4 years ago

yes i did @Technoboy-

and I delete that pod - the pod which through this exception and when the new pod is UP , no exception about this anymore. It's seem happen when Controller is not ready but Worker is UP earlier than Controller this exception will be happen

Technoboy- commented 4 years ago

yes, maybe

dungnt081191 commented 4 years ago

@Technoboy- anyidea about the log when starting starting helix mirror maker manager Starting ZkClient event thread Opening socket connection to server Waiting for keeper state SyncConnected

[2019-11-08 08:19:47,527] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient:713)
[2019-11-08 08:19:47,533] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient:936)
[2019-11-08 08:19:47,745] INFO Invalid cluster setup, missing znode path: /cluster/IDEALSTATES
Invalid cluster setup, missing znode path: /cluster/CONFIGS/CLUSTER/cluster
Invalid cluster setup, missing znode path: /cluster/CONFIGS/PARTICIPANT
Invalid cluster setup, missing znode path: /cluster/CONFIGS/RESOURCE
Invalid cluster setup, missing znode path: /cluster/PROPERTYSTORE
Invalid cluster setup, missing znode path: /cluster/LIVEINSTANCES
Invalid cluster setup, missing znode path: /cluster/INSTANCES
Invalid cluster setup, missing znode path: /cluster/EXTERNALVIEW
Invalid cluster setup, missing znode path: /cluster/CONTROLLER
Invalid cluster setup, missing znode path: /cluster/STATEMODELDEFS
Invalid cluster setup, missing znode path: /cluster/CONTROLLER/MESSAGES
Invalid cluster setup, missing znode path: /cluster/CONTROLLER/ERRORS
Invalid cluster setup, missing znode path: /cluster/CONTROLLER/STATUSUPDATES
Invalid cluster setup, missing znode path: /cluster/CONTROLLER/HISTORY
 (org.apache.helix.manager.zk.ZKUtil:88)
[2019-11-08 08:19:47,778] INFO Invalid cluster setup, missing znode path: /controller/IDEALSTATES
Invalid cluster setup, missing znode path: /controller/CONFIGS/CLUSTER/controller
Invalid cluster setup, missing znode path: /controller/CONFIGS/PARTICIPANT
Invalid cluster setup, missing znode path: /controller/CONFIGS/RESOURCE
Invalid cluster setup, missing znode path: /controller/PROPERTYSTORE
......
.....
.....

log look like invalid setting Cluster , but where, can you show me more detail about this ?

xhl1988 commented 4 years ago

What zk path do you have?

Technoboy- commented 4 years ago

looks like a small issue that occurs by configured wrong cluster name. such as mismatch between manager/controller/worker/

dungnt081191 commented 4 years ago

@xhl1988 What zk path do you have?

sorry , zk path ? , do you know this zkpath in which config ?

dungnt081191 commented 4 years ago

@Technoboy- i set only 1 Helix Cluster name : uReplicatorDev - for example

Technoboy- commented 4 years ago

paste some controller log in the same time as worker above

dungnt081191 commented 4 years ago

hi @Technoboy- te @xhl1988 @yangy0000 how can i switch from worker2.0 currently to worker3.0 . i meet the exception about :

hed-0) (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:72) kafka.common.BrokerEndPointNotAvailableException: End point with security protocol PLAINTEXT not found for broker 1 at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1$$anonfun$apply$5.apply(ClientUtils.scala:149) at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1$$anonfun$apply$5.apply(ClientUtils.scala:149) at scala.Option.getOrElse(Option.scala:121) at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1.apply(ClientUtils.scala:149) at kafka.client.ClientUtils$$anonfun$getPlaintextBrokerEndPoints$1.apply(ClientUtils.scala:145) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.client.ClientUtils$.getPlaintextBrokerEndPoints(ClientUtils.scala:145) at kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread.doWork(CompactConsumerFetcherManager.scala:345) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) [2019-11-06 09:22:09,000] INFO [CompactConsumerFetcherManager-1573031922798] Added fetcher for partitions ArrayBuffer() (kafka.mirrormaker.CompactConsumerFetcherManager:66)

In my case , i consumer from Source Kafka with SSL config, produce to Destination Kafka with PLAINTEXT .

dungnt081191 commented 4 years ago

@Technoboy- any idea about this bro