uber / uReplicator

Improvement of Apache Kafka Mirrormaker
Apache License 2.0
917 stars 198 forks source link

start controller error #325

Closed shaolilanse closed 3 years ago

shaolilanse commented 3 years ago

my command: java -Dlog4j.configuration=file:config/tools-log4j.properties -Xms3g -Xmx3g -Xmn512m -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+AlwaysPreTouch -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:+PrintCommandLineFlags -XX:CMSInitiatingOccupancyFraction=80 -XX:SurvivorRatio=2 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCTimeStamps -Xloggc:/tmp/ureplicator-controller/gc-ureplicator-controller.log -server -cp uReplicator-Controller/target/uReplicator-Controller-2.0.1-SNAPSHOT-jar-with-dependencies.jar com.uber.stream.kafka.mirrormaker.controller.ControllerStarter -enableFederated false -mode auto ackUpToGit false -enableAutoTopicExpansion true -port 9000 -refreshTimeInSeconds 10 -srcKafkaZkPath localhost:2181/1001 -zookeeper localhost:2181 -destKafkaZkPath localhost:2181/1002 -helixClusterName testMirrorMaker

then error log occurs: `[2020-12-29 14:55:27,233] ERROR Error registering metrics! (com.uber.stream.kafka.mirrormaker.controller.core.AutoRebalanceLiveInstanceChangeListener) java.lang.IllegalStateException: Not initialized yet at com.google.common.base.Preconditions.checkState(Preconditions.java:459) at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207) at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178) at com.uber.stream.kafka.mirrormaker.controller.core.AutoRebalanceLiveInstanceChangeListener.registerMetrics(AutoRebalanceLiveInstanceChangeListener.java:150) at com.uber.stream.kafka.mirrormaker.controller.core.AutoRebalanceLiveInstanceChangeListener.(AutoRebalanceLiveInstanceChangeListener.java:114) at com.uber.stream.kafka.mirrormaker.controller.core.HelixMirrorMakerManager.start(HelixMirrorMakerManager.java:132) at com.uber.stream.kafka.mirrormaker.controller.ControllerInstance.start(ControllerInstance.java:200) at com.uber.stream.kafka.mirrormaker.controller.ControllerStarter.start(ControllerStarter.java:64) at com.uber.stream.kafka.mirrormaker.controller.ControllerStarter.main(ControllerStarter.java:163) [2020-12-29 14:55:27,235] ERROR Either srcCluster: or dstCluster: is empty, return false (com.uber.stream.kafka.mirrormaker.controller.core.HelixMirrorMakerManager) [2020-12-29 14:55:27,037] INFO Invalid cluster setup, missing znode path: /cluster/IDEALSTATES Invalid cluster setup, missing znode path: /cluster/CONFIGS/CLUSTER/cluster Invalid cluster setup, missing znode path: /cluster/CONFIGS/PARTICIPANT Invalid cluster setup, missing znode path: /cluster/CONFIGS/RESOURCE Invalid cluster setup, missing znode path: /cluster/PROPERTYSTORE Invalid cluster setup, missing znode path: /cluster/LIVEINSTANCES Invalid cluster setup, missing znode path: /cluster/INSTANCES Invalid cluster setup, missing znode path: /cluster/EXTERNALVIEW Invalid cluster setup, missing znode path: /cluster/CONTROLLER Invalid cluster setup, missing znode path: /cluster/STATEMODELDEFS Invalid cluster setup, missing znode path: /cluster/CONTROLLER/MESSAGES Invalid cluster setup, missing znode path: /cluster/CONTROLLER/ERRORS Invalid cluster setup, missing znode path: /cluster/CONTROLLER/STATUSUPDATES Invalid cluster setup, missing znode path: /cluster/CONTROLLER/HISTORY (org.apache.helix.manager.zk.ZKUtil)

`

my config: controller---> { kafka.source.clusters : kafka.destination.clusters : federated.enabled : false federated.deployment.name : controller.helix.cluster.name : testMirrorMaker controller.zk.str : localhost:2181 controller.port : 9000 controller.mode : auto controller.instance.id : localhost controller.environment : env controller.graphite.port : 0 controller.metrics.prefix : kafka-mirror-maker-controller controller.graphite.report.freq.in.sec : 60 controller.enable.jmx.report : true controller.enable.graphite.report : true controller.c3.host : localhost controller.c3.port : 0 controller.enable.auto.topic.expansion : true controller.pattern.exclude.topics : __consumer_offsets controller.srckafka.zkStr : localhost:2181/1001 controller.destkafka.zkStr : localhost:2181/1002 controller.max.working.instances : 0 controller.auto.rebalance.delay.in.seconds : 120 controller.refresh.time.in.seconds : 10 controller.init.wait.time.in.seconds : 120 controller.auto.rebalance.period.in.seconds : 0 controller.auto.rebalance.min.interval.in.seconds : 600 controller.auto.rebalance.min.lag.in.seconds : 900 controller.auto.rebalance.min.lag.offset : 100000 controller.auto.rebalance.max.offset.valid.in.seconds : 1800 controller.workload.refresh.period.in.seconds : 600 controller.auto.rebalance.workload.ratio.threshold : 1.2 controller.auto.rebalance.max.dedicated.ratio : 0.5 controller.auto.rebalance.max.stuck.partition.movements : 3 controller.auto.rebalance.move.stuck.partition.after.minutes : 20 controller.num.offset.thread : 10 controller.blocking.queue.size : 30000 controller.offset.refresh.interval.in.sec : 300 controller.backup.to.git : false controller.local.backup.file.path : /var/log/kafka-mirror-maker-controller config.file : controller.max.workload.per.worker.byte.within.region : 8388608.0 controller.max.workload.per.worker.byte.cross.region : 8388608.0 }

so that's why?

yangy0000 commented 3 years ago

Is your controller crash? or just have some error log?

shaolilanse commented 3 years ago

controller no crash, but can not replication data from sourceCluster to desCluster. When I produce a data to desCluster topic, I just see worker log : [2020-12-29 16:50:28,233] INFO commitOffset finished, number of topics: 1 (com.uber.stream.ureplicator.worker.ZookeeperCheckpointManager) controller log: [2020-12-29 17:17:36,705] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager) [2020-12-29 17:18:36,706] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager) [2020-12-29 17:19:36,706] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager) [2020-12-29 17:20:36,706] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager) [2020-12-29 17:21:36,706] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager) [2020-12-29 17:22:05,709] INFO Refreshing workload for source kafka1002 (com.uber.stream.kafka.mirrormaker.common.core.WorkloadInfoRetriever) [2020-12-29 17:22:05,710] INFO Retrieved workload for ts: 1609233725709 for srcKafkaCluster: kafka1002 and 1 topics (com.uber.stream.kafka.mirrormaker.common.core.WorkloadInfoRetriever)

yangy0000 commented 3 years ago

from the log, it looks like bot worker&controller is working. Have you produce test messages to srcCluster?

shaolilanse commented 3 years ago

now it is ok

shaolilanse commented 3 years ago

The previous error was reported because my ZK was abnormal, now it is ok

vuxuanlai commented 3 years ago

Hi @shaolilanse , I got the same issue. How can you fix it?