Open Gk8143197049 opened 5 years ago
Here is the sample starter config. We will compose an READ.ME later
-mode customized \ -- helix rebalance mode(always use customized) -enableAutoWhitelist true \ -- enable topic auto whitelist or not -port 9000 \ -- controller rest endpoint port -refreshTimeInSeconds 10 \ -- auto whitelist background check interval -srcKafkaZkPath localhost:2181/cluster1 \ -- source broker -zookeeper localhost:2181 \ -- zookeeper for helix cluster -destKafkaZkPath localhost:2181/cluster1 \ -- destination broker -helixClusterName testMirrorMaker -- helix cluster name
My current version is kafka version 2.1.1 . I am running this below scripts from the kafka broker kafka01.xxxx (kafka/zookeeper is running, i.e. my source cluster) and source cluster is kafka01.xxxx target cluster is kafka02.xxxxx ( kafka, zookeeper is already running) .
Controller script to kick the controller up and running?
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller.sh -port 9000 -helixClusterName uReplicatorDev -refreshTimeInSeconds 10 -enableAutoWhitelist true -srcKafkaZkPath kafka01.xxxxxxxxxxxxxxxxx:2181 -destKafkaZkPath kafka02.xxxxxxxxxxxxxxxx:2181
1.how do I check the source kafka path ( i just gave the kafka zookeeper connection below) ? Is it right ?
2.what should be configured in the producer and consumer properties( either source or target cluster, In my case I am running the uber replicator on source cluster) for my scenario? and also helix configuration details ??
Worker Script:
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker.sh --consumer.config ./config/consumer.properties --producer.config ./config/producer.properties --helix.config ./config/helix.properties
for my worker configuration I gave source cluster details in consumer, producer , it fails ? Helix Configurations zkServer=kafka01.xxxxxxxxxxxxx:2181 instanceId=testHelixMirrorMaker01 helixClusterName=uReplicatorDev enableAutowhitelist=true port=9000 srckafkaZkpath=kafka01.xxxxxxxxxxxxxxxxxxxxxxxxx:2181 destKafkaZkpath=kafka02.xxxxxxxxxxxxxxxxxxxxxxxx:2181
Consumer properties zookeeper.connect=kafka01.xxxxxxxxxxxxxxxx:2181 zookeeper.connection.timeout.ms=30000 zookeeper.session.timeout.ms=30000
group.id=kloak-mirrormaker-test consumer.id=kloakmms01-sjc1 partition.assignment.strategy=roundrobin socket.receive.buffer.bytes=1048576 fetch.message.max.bytes=8388608 queued.max.message.chunks=5 auto.offset.reset=smallest
Producer.properties bootstrap.servers=kafka01.xxxxxxxxxxxxxxxxxxxxxxxxxxx:9092,kafka01.xxxxxxxxxxxx:9093,kafka02.xxxxxxxxx:9092 client.id=kloak-mirrormaker-test producer.type=async compression.type=none key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
Let me know . What mistakes I have made on the configurations on worker and controller ? By this I will be able to build the document? Where can I look for logs for controller and worker ??
-zookeeper xxxx:2181 is missing for the controller, your controller might not running. (-zookeepr xxxx:2181 need to be the same as zkServer in helix.properties for worker)
let's say if you use kafka01.xxxxxxxxxxxxx:2181 as -zookeeper, you should able to see /uReplicatorDev path under kafka01.xxxxxxxxxxxxx:2181 if you setup controller successfully
Right now I have added the zookeeper in the controller start up script. Yes. I have passed same zookeeper details for the controller and zkserver in helix properties. Do I need to create topics in destination cluster( as similar name in Source Cluster) broker for worker to enable whitelist ?
I will post ,if I get errors. Mostly the configuration should work.
Yes, you need to create topic in destination cluster manually
Got you! My destination server is cloud server which is turned off so the controller is failing. I will get back to ou tomorrow morning . Once it is good to go I can create document in a proper format and drop it as readme or configuration document.
Thanks.
Also, when you set -enableAutoWhitelist true in controller, uReplicator will automatic replicate message when topic exists in both source and destination cluster, (Example 2 in readme) when -enableAutoWhitelist set to false, you need to add topic to uReplicator manually(Example 1)
CONTROLLER ERRORS: 1. where Do I change the directory path for the controller as the current user does not have permissions in /var/log. I can only see the info where do i put the logs in debug for now to see the errors in depth? How do I correct them??
[2019-03-13 08:02:19,686] ERROR Error writing backup to the file idealState-backup-env-uReplicatorDev (com.uber.stream.kafka.mirrormaker.controller.core.FileBackUpHandler:48)
java.io.FileNotFoundException: /var/log/kafka-mirror-maker-controller/idealState-backup-env-uReplicatorDev (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.
SIMILAR ERRORS FOR ALL TOPICS ( "does not match external view" how to fix them )
[2019-03-13 08:16:58,666] ERROR For topic:topics1, number of partitions for IdealState: topics1, {IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} doesn't match ExternalView: topics1, {BUCKET_SIZE=0, IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:177) [2019-03-13 08:16:58,674] ERROR For topic:topics2, number of partitions for IdealState: topics2, {IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} doesn't match ExternalView: topics2, {BUCKET_SIZE=0, IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:177) [2019-03-13 08:16:58,695] ERROR For topic:topics3, number of partitions for IdealState: topics3, {IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} doesn't match ExternalView: topics3, {BUCKET_SIZE=0, IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:177) [2019-03-13 08:16:58,725] ERROR For topic:topics4, number of partitions for IdealState: topics4, {IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} doesn't match ExternalView: topics4, {BUCKET_SIZE=0, IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:177) [2019-03-13 08:16:58,731] ERROR For topic:topics5, number of partitions for IdealState: topics5, {IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} doesn't match ExternalView: topics5, {BUCKET_SIZE=0, IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:177)
My worker shutdows saying [2019-03-13 09:43:24,032] WARN [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-0]: Error in fetch Name: FetchRequest; Version: 3; CorrelationId: 12; ClientId: kloak-mirrormaker-test; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; MaxBytes:2147483647 bytes; RequestInfo: (connect-offsets-17,PartitionFetchInfo(5574,8388608)),(KafkaCruiseControlModelTrainingSamples-2,PartitionFetchInfo(693664,8388608)),(KafkaCruiseControlModelTrainingSamples-28,PartitionFetchInfo(693646,8388608)),(KafkaCruiseControlPartitionMetricSamples-0,PartitionFetchInfo(10484957,8388608)),(connect-statuses-1,PartitionFetchInfo(161248,8388608)),(KafkaCruiseControlPartitionMetricSamples-2,PartitionFetchInfo(10372823,8388608)),(connect-statuses-4,PartitionFetchInfo(171450,8388608)),(KafkaCruiseControlPartitionMetricSamples-4,PartitionFetchInfo(10260544,8388608)),(connect-offsets-2,PartitionFetchInfo(62969,8388608)),(KafkaCruiseControlModelTrainingSamples-30,PartitionFetchInfo(693646,8388608)),(KafkaCruiseControlPartitionMetricSamples-12,PartitionFetchInfo(10371857,8388608)),(connect-statuses-3,PartitionFetchInfo(83091,8388608)),(connect-offsets-15,PartitionFetchInfo(18954,8388608)),(connect-offsets-12,PartitionFetchInfo(7115,8388608)),(connect-offsets-9,PartitionFetchInfo(0,8388608)),(connect-offsets-4,PartitionFetchInfo(15289,8388608)),(KafkaCruiseControlPartitionMetricSamples-10,PartitionFetchInfo(10372834,8388608)),(connect-statuses-0,PartitionFetchInfo(130572,8388608)),(connect-offsets-22,PartitionFetchInfo(36186,8388608)),(KafkaCruiseControlPartitionMetricSamples-18,PartitionFetchInfo(10316469,8388608)),(connect-offsets-5,PartitionFetchInfo(523448,8388608)),(KafkaCruiseControlModelTrainingSamples-6,PartitionFetchInfo(693655,8388608)),(KafkaCruiseControlPartitionMetricSamples-16,PartitionFetchInfo(10372110,8388608)),(KafkaCruiseControlModelTrainingSamples-22,PartitionFetchInfo(693691,8388608)),(connect-offsets-0,PartitionFetchInfo(0,8388608)),(KafkaCruiseControlPartitionMetricSamples-8,PartitionFetchInfo(10260652,8388608)),(connect-offsets-10,PartitionFetchInfo(0,8388608)),(connect-offsets-7,PartitionFetchInfo(3208,8388608)),(connect-offsets-11,PartitionFetchInfo(3025,8388608)),(_confluent-ksql-default__command_topic-0,PartitionFetchInfo(0,8388608)),(KafkaCruiseControlModelTrainingSamples-16,PartitionFetchInfo(693709,8388608)),(KafkaCruiseControlModelTrainingSamples-24,PartitionFetchInfo(693704,8388608)),(KafkaCruiseControlPartitionMetricSamples-30,PartitionFetchInfo(10203994,8388608)),(KafkaCruiseControlModelTrainingSamples-14,PartitionFetchInfo(693678,8388608)),(connect-offsets-6,PartitionFetchInfo(7631,8388608)),(KafkaCruiseControlPartitionMetricSamples-14,PartitionFetchInfo(10316236,8388608)),(connect-offsets-24,PartitionFetchInfo(0,8388608)),(_schemas-0,PartitionFetchInfo(9568,8388608)),(KafkaCruiseControlModelTrainingSamples-12,PartitionFetchInfo(693701,8388608)),(KafkaCruiseControlModelTrainingSamples-26,PartitionFetchInfo(693695,8388608)),(connect-offsets-14,PartitionFetchInfo(16918,8388608)),(connect-offsets-23,PartitionFetchInfo(8343,8388608)),(KafkaCruiseControlPartitionMetricSamples-20,PartitionFetchInfo(10204351,8388608)),(CruiseControlMetrics-0,PartitionFetchInfo(26967492,8388608)),(connect-offsets-1,PartitionFetchInfo(0,8388608)),(connect-statuses-2,PartitionFetchInfo(133463,8388608)),(connect-offsets-18,PartitionFetchInfo(8167,8388608)),(connect-offsets-8,PartitionFetchInfo(0,8388608)),(connect-offsets-3,PartitionFetchInfo(153942,8388608)),(connect-offsets-21,PartitionFetchInfo(0,8388608)),(KafkaCruiseControlPartitionMetricSamples-24,PartitionFetchInfo(10204023,8388608)),(connect-offsets-16,PartitionFetchInfo(1649,8388608)),(KafkaCruiseControlModelTrainingSamples-0,PartitionFetchInfo(693676,8388608)),(KafkaCruiseControlModelTrainingSamples-20,PartitionFetchInfo(693696,8388608)),(connect-offsets-20,PartitionFetchInfo(3577,8388608)),(connect-configs-0,PartitionFetchInfo(494530,8388608)),(connect-offsets-19,PartitionFetchInfo(29607,8388608)),(connect-offsets-13,PartitionFetchInfo(0,8388608)),(KafkaCruiseControlModelTrainingSamples-18,PartitionFetchInfo(693736,8388608)),(KafkaCruiseControlPartitionMetricSamples-6,PartitionFetchInfo(10204440,8388608)),(KafkaCruiseControlModelTrainingSamples-8,PartitionFetchInfo(693698,8388608)),(KafkaCruiseControlModelTrainingSamples-10,PartitionFetchInfo(693724,8388608)),(KafkaCruiseControlModelTrainingSamples-4,PartitionFetchInfo(693735,8388608)),(KafkaCruiseControlPartitionMetricSamples-26,PartitionFetchInfo(10091771,8388608)),(KafkaCruiseControlPartitionMetricSamples-22,PartitionFetchInfo(10148366,8388608)),(__KafkaCruiseControlPartitionMetricSamples-28,PartitionFetchInfo(10113719,8388608)). Possible cause: java.lang.OutOfMemoryError: Java heap space (kafka.mirrormaker.CompactConsumerFetcherThread:70)
[2019-03-13 09:43:24,033] ERROR [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-0]: In FetcherThread error due to (kafka.mirrormaker.CompactConsumerFetcherThread:76)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.
[2019-03-13 09:43:27,289] INFO Shutting down pool: java.util.concurrent.ThreadPoolExecutor@59474f18[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] (org.apache.helix.messaging.handling.HelixTaskExecutor:474) [2019-03-13 09:43:27,289] INFO Reset exectuor for msgType: STATE_TRANSITION, pool: java.util.concurrent.ThreadPoolExecutor@1649b0e6[Running, pool size = 98, active threads = 0, queued tasks = 0, completed tasks = 98] (org.apache.helix.messaging.handling.HelixTaskExecutor:530) [2019-03-13 09:43:27,289] INFO Shutting down pool: java.util.concurrent.ThreadPoolExecutor@1649b0e6[Running, pool size = 98, active threads = 0, queued tasks = 0, completed tasks = 98] (org.apache.helix.messaging.handling.HelixTaskExecutor:474) [2019-03-13 09:43:27,363] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,364] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,365] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,366] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,367] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] WARN Default reset method invoked. Either because the process longer own this resource or session timedout (org.apache.helix.participant.statemachine.StateModel:67) [2019-03-13 09:43:27,368] INFO Registering bean: ParticipantName=testHelixMirrorMaker01 (org.apache.helix.monitoring.ParticipantStatusMonitor:109)
looks like your worker out of memory.
Is it at my source cluster ? when I look at my source cluster kafka01. it says "2019-03-13T11:50:01.334-0700: Total time for which application threads were stopped: 0.0070540 seconds, Stopping threads took: 0.0000253 seconds 2019-03-13T11:50:01.773-0700: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 8388608 bytes, new threshold 15 (max 15)
Can you go to start-worker.sh to increase the memory size and have a try again ? Btw , has any message been replicate to destination yet ?
I have updated the heapregionsize to 512MB( it was 8M earlier) . beolw is the updated one 👍 exec "$JAVACMD" $JAVA_OPTS -Dapp_name=uReplicator-Worker -cp uReplicator-Distribution/target/uReplicator-Distribution-0.1-SNAPSHOT-jar-with-dependencies.jar -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintReferenceGC -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=512M -XX:InitiatingHeapOccupancyPercent=85 -XX:+UnlockExperimentalVMOptions -XX:G1MixedGCLiveThresholdPercent=85 -XX:G1HeapWastePercent=5 \
No replication to destination yet..
the controller log says "[2019-03-13 13:10:07,903] ERROR For topic:oracle-20-MERCHANT, number of partitions for IdealState: topic1, {IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{0={testHelixMirrorMaker01=ONLINE}}{} doesn't match ExternalView: oracle-20-MERCHANT, {BUCKET_SIZE=0, IDEAL_STATE_MODE=CUSTOMIZED, MAX_PARTITIONS_PER_INSTANCE=1, NUM_PARTITIONS=1, REBALANCE_MODE=CUSTOMIZED, REPLICAS=1, STATE_MODEL_DEF_REF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT}{}{} (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:177) [2019-03-13 13:10:07,903] WARN Got exception when removing metrics for externalView.topicPartitions.testHelixMirrorMaker01.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319) java.lang.NullPointerException at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317) at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210) at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
The controller says: [2019-03-13 13:31:36,154] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) [2019-03-13 13:31:36,165] ERROR Error registering metrics! (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:304) java.lang.NullPointerException at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:301) at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210) at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) 2019-03-13T13:31:37.157-0700: Total time for which application threads were stopped: 0.0002224 seconds, Stopping threads took: 0.0001259 seconds [2019-03-13 13:32:36,136] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) [2019-03-13 13:33:01,048] INFO AutoRebalanceLiveInstanceChangeListener.rebalanceCurrentCluster() wakes up! (com.uber.stream.kafka.mirrormaker.controller.core.AutoRebalanceLiveInstanceChangeListener:200) [2019-03-13 13:33:01,051] INFO No assignment got changed, do nothing! (com.uber.stream.kafka.mirrormaker.controller.core.AutoRebalanceLiveInstanceChangeListener:226) [2019-03-13 13:33:36,145] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:33:36.192-0700: Total time for which application threads were stopped: 0.0001266 seconds, Stopping threads took: 0.0000325 seconds [2019-03-13 13:34:36,137] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:34:36.199-0700: Total time for which application threads were stopped: 0.0001149 seconds, Stopping threads took: 0.0000222 seconds [2019-03-13 13:35:36,135] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) [2019-03-13 13:36:36,145] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:37:07.313-0700: Total time for which application threads were stopped: 0.0006683 seconds, Stopping threads took: 0.0005770 seconds [2019-03-13 13:37:36,137] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:37:36.326-0700: Total time for which application threads were stopped: 0.0001337 seconds, Stopping threads took: 0.0000358 seconds [2019-03-13 13:38:36,135] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:39:26.357-0700: Total time for which application threads were stopped: 0.0006990 seconds, Stopping threads took: 0.0005737 seconds [2019-03-13 13:39:36,153] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:39:36.365-0700: Total time for which application threads were stopped: 0.0001070 seconds, Stopping threads took: 0.0000266 seconds [2019-03-13 13:40:36,140] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:40:36.388-0700: Total time for which application threads were stopped: 0.0001364 seconds, Stopping threads took: 0.0000414 seconds [2019-03-13 13:41:36,149] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:41:36.417-0700: Total time for which application threads were stopped: 0.0001539 seconds, Stopping threads took: 0.0000460 seconds [2019-03-13 13:42:36,136] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:42:36.430-0700: Total time for which application threads were stopped: 0.0001555 seconds, Stopping threads took: 0.0000532 seconds [2019-03-13 13:43:36,135] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:43:36.460-0700: Total time for which application threads were stopped: 0.0001258 seconds, Stopping threads took: 0.0000370 seconds 2019-03-13T13:44:06.491-0700: Total time for which application threads were stopped: 0.0042761 seconds, Stopping threads took: 0.0000446 seconds [2019-03-13 13:44:36,136] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:44:36.494-0700: Total time for which application threads were stopped: 0.0001369 seconds, Stopping threads took: 0.0000266 seconds [2019-03-13 13:45:36,154] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) [2019-03-13 13:45:48,095] INFO Scan statusUpdates and errors for cluster: uReplicatorDev, by controller: org.apache.helix.manager.zk.ZKHelixManager@2f09fe81 (org.apache.helix.monitoring.ZKPathDataDumpTask:67) 2019-03-13T13:46:18.587-0700: Total time for which application threads were stopped: 0.0001589 seconds, Stopping threads took: 0.0000660 seconds [2019-03-13 13:46:36,136] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) [2019-03-13 13:47:36,138] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) 2019-03-13T13:47:36.606-0700: Total time for which application threads were stopped: 0.0001283 seconds, Stopping threads took: 0.0000366 seconds [2019-03-13 13:48:36,138] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83) [2019-03-13 13:49:36,138] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83)
The worker says: 1246.340: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 2902, predicted base time: 5.19 ms, remaining time: 194.81 ms, target pause time: 200.00 ms] 1246.340: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 9 regions, survivors: 2 regions, predicted young region time: 16.98 ms] 1246.340: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 9 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 22.18 ms, target pause time: 200.00 ms] 2019-03-13T13:51:43.520-0700: [SoftReference, 0 refs, 0.0003968 secs]2019-03-13T13:51:43.520-0700: [WeakReference, 0 refs, 0.0001228 secs]2019-03-13T13:51:43.520-0700: [FinalReference, 0 refs, 0.0001053 secs]2019-03-13T13:51:43.520-0700: [PhantomReference, 0 refs, 0 refs, 0.0001648 secs]2019-03-13T13:51:43.520-0700: [JNI Weak Reference, 0.0000098 secs], 0.0159517 secs] [Parallel Time: 14.7 ms, GC Workers: 2] [GC Worker Start (ms): Min: 1246340.8, Avg: 1246341.3, Max: 1246341.8, Diff: 1.0] [Ext Root Scanning (ms): Min: 0.1, Avg: 0.5, Max: 1.0, Diff: 0.9, Sum: 1.1] [Update RS (ms): Min: 1.7, Avg: 1.7, Max: 1.7, Diff: 0.0, Sum: 3.3] [Processed Buffers: Min: 6, Avg: 8.0, Max: 10, Diff: 4, Sum: 16] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 11.6, Avg: 11.7, Max: 11.7, Diff: 0.0, Sum: 23.3] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 13.4, Avg: 13.9, Max: 14.4, Diff: 1.0, Sum: 27.8] [GC Worker End (ms): Min: 1246355.2, Avg: 1246355.2, Max: 1246355.2, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.1 ms] [Other: 1.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.9 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] [Eden: 288.0M(288.0M)->0.0B(256.0M) Survivors: 64.0M->64.0M Heap: 688.0M(960.0M)->448.8M(960.0M)] [Times: user=0.03 sys=0.00, real=0.01 secs] 2019-03-13T13:51:43.521-0700: Total time for which application threads were stopped: 0.0161888 seconds, Stopping threads took: 0.0000244 seconds 2019-03-13T13:51:44.442-0700: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 33554432 bytes, new threshold 1 (max 15)
You can add the following in your controller start command to remove the metrics NullPointerException.
-graphiteHost $METRICS_REPORTER_GRAPHITE_HOST | your graphite host
-graphitePort $METRICS_REPORTER_GRAPHITE_PORT | your graphite port
-metricsPrefix $METRICS_REPORTER_PREFIX | metric prefix
Worker log seems all fine.
Even with the NullPointerException, does your ureplicator start to replicate data to destination?
No I don't see any data repicating from src cluster topic to target cluster. I checked my destination cluster and source clusters.
How do we deal with file not fond exception.
[2019-03-13 14:23:14,189] ERROR Error writing backup to the file idealState-backup-env-uReplicatorDev (com.uber.stream.kafka.mirrormaker.controller.core.FileBackUpHandler:48)
java.io.FileNotFoundException: /var/log/kafka-mirror-maker-controller/idealState-backup-env-uReplicatorDev (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.
I think I gave wrong properties for consumer properties. Can you take a look in the above chain. My src cluster ( kafka01) is where I have installed the uReplicator controller and worker what parameters should I pass coorectly or precisely?
curl localhost:9000/topics
and make sure there are topic whitelisted?curl localhost:9000/topics/<topic> | jq .
and see if every partition is assigned a worker and externalview/idealstate matches.auto.offset.reset=smallest
in consumer.properties.I am following your directions but can you look at my producer properties if it is right ??
Thanks
Your producer properties looks fine. We might need more information from the log to identify what goes wrong.
I think the issue is within the worker. Can you paste more/full worker log after you have done above steps?
Sure. My destination cluster(Cloud server) and is turned off. I will paste the logs tomorrow morning.
I have attached the log. Do take a look at it. I do not have any data transfer from source cluster to target cluster.
Is that gc log? Do you have the application log?
Btw, what's the result of the steps above? can you also paste them here?
I have attahced the workter logs. I have data flowing into the topic on Source Cluster.
[kafka@kafka01 confluent-5.1.0]$ curl -X POST -d '{"topic":"oracle-24-xxxxxxxxxxxx", "numPartitions":"1"}' http://localhost:9000/topics Successfully add new topic: {topic: oracle-24-xxxxxxxxxx, partition: 1, pipeline: null}[kafka@kafka01
confluent-5.1.0]$ curl localhost:9000/topics Current serving topics: [oracle-20-xxxxxxxxxxx, oracle-23-xxxx, oracle-24-xxxxxxxx][kafka@kafka01
3.[kafka@kafka01 confluent-5.1.0]$ curl localhost:9000/topics/oracle-24-xxxxxxxx |jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 247 100 247 0 0 4287 0 --:--:-- --:--:-- --:--:-- 4333 { "externalView": { "0": [ "testHelixMirrorMaker01" ] }, "idealState": { "0": [ "testHelixMirrorMaker01" ] }, "serverToNumPartitionsMapping": { "testHelixMirrorMaker01": 1 }, "serverToPartitionMapping": { "testHelixMirrorMaker01": [ "0" ] }, "topic": "oracle-24-xxxxxxxx" }
The worker did receive the partition assignment and try to consume from source cluster but failed:
[2019-03-15 07:44:11,089] WARN Unexpected error code: 76. (org.apache.kafka.common.protocol.Errors:719)
From source Kafka code, error 76 means:
UNSUPPORTED_COMPRESSION_TYPE(76, "The requesting client does not support the compression type of given partition.", UnsupportedCompressionTypeException::new);
What compression type do you use in your source cluster?
The source Cluster is compression.type=none at the producer Cluster The destination CLuster is compression.type=none at the producer Cluster.
THanks
Do you want me to change the properties . DO let me know. I cando it.
Thanks. I realy appreciate your time. !!
What do you mean by producer Cluster
?
There should be a config in your broker config: compression.type=<>
, what's your value in your broker?
@xhl1988 -- We can configure the compression type either in sever.properties or producer.properties. I have 2 different properties file like server.properties and producer.properties
server.properties listeners=PLAINTEXT://:9092 advertised.listeners=PLAINTEXT://your.host.name:9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=xx/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 log.flush.interval.messages=100 log.flush.interval.ms=10 log.retention.hours=1 log.segment.bytes=1073741824 log.retention.check.interval.ms=30 zookeeper.connect=localhost:2181 zookeeper.connection.timeout.ms=6000 confluent.support.metrics.enable=false metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter group.initial.rebalance.delay.ms=0
producer.properties bootstrap.servers=localhost:9092,localhost:9093
compression.type=none
Do let me know , which on supports with Uberreplicator. I can change accordingly.
We are able to reproduce the same error on our local if there is any msg with zstd compression type.
I have changed the compression.type on the source brokers and I did not see any 76 errors( unsupported . I donot see any data flowing from source to destination cluster.
egrep for 76 and I was not able to find any error? What do you think can be error?
New logs seem all good. Are you producing to the topic with new messages?
Edit: looks like worker copied 442848 messages for oracle-23-xxx
, can you check that?
You can add -Xloggc:<path to gc log>
remove remove the gc info from the application log.
Yes I am Producing the New messages. I am surprised I don't see any data in the target cluster.
can I add to the worker script -Xloggc:
Thanks.
oracle-23-MIF-0
from source cluster?Your log seems fine:
[2019-03-15 11:35:02,467] INFO [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-1]: Topic partitions dump in fetcher thread: ArrayBuffer([oracle-24-MIF_CHANGES-0, Offset 281329869] , [oracle-23-MIF-0, Offset 0] )
[2019-03-15 11:40:02,517] INFO [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-1]: Topic partitions dump in fetcher thread: ArrayBuffer([oracle-24-MIF_CHANGES-0, Offset 281329869] , [oracle-23-MIF-0, Offset 442848] ) (kafka.mirrormaker.CompactConsumerFetcherThread:66)
It means you have copied 442848 message for oracle-23-MIF-0
. Can you check the commit offset in zookeeper: <srckafkaZkpath>:/consumer/kloak-mirrormaker-test/offsets/oracle-23-MIF-0/0
? see if it's at least 442848?
Also after you remove the gc log, you can collect the logger for a longer time and see if the offset in the log increase.
You can periodically check the committed offset, if it increases, then the replication should be working.
I am tryed with removing one of the producer hostname? what are ports used for workers, controller and manger and helix ??can you let me know. i think this is a network issue?
@xhl1988 Does this mean the topics replica are dumping in kafka01 with another broker with 9093 on the same cluster? . if yes? How do I change it another broker sitting on another cluster ? The source broker is kafka01:9092.
Logs : cat nohup.out |egrep ArrayBuffer [2019-03-19 10:17:59,986] INFO [CompactConsumerFetcherManager-1553015862247] Fetcher Thread for topic partitions: ArrayBuffer([oracle-23-MIF-0, InitialOffset -1] ) is CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-1 (kafka.mirrormaker.CompactConsumerFetcherManager:66) [2019-03-19 10:17:59,987] INFO [CompactConsumerFetcherManager-1553015862247] Added fetcher for partitions ArrayBuffer([oracle-23-xxx-0, initOffset -1 to broker BrokerEndPoint(1,kafka01.xxxxxxx,9093)] ) (kafka.mirrormaker.CompactConsumerFetcherManager:66) [2019-03-19 10:17:59,988] INFO [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-1]: Topic partitions dump in fetcher thread: ArrayBuffer([oracle-23-xxx-0, Offset 0] ) (kafka.mirrormaker.CompactConsumerFetcherThread:66) [2019-03-19 10:23:00,108] INFO [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-1]: Topic partitions dump in fetcher thread: ArrayBuffer([oracle-23-xxx-0, Offset 416354] ) (kafka.mirrormaker.CompactConsumerFetcherThread:66) [2019-03-19 10:28:00,189] INFO [CompactConsumerFetcherThread-kloak-mirrormaker-test_kloakmms01-sjc1-0-1]: Topic partitions dump in fetcher thread: ArrayBuffer([oracle-23-xxx-0, Offset 1057382] ) (kafka.mirrormaker.CompactConsumerFetcherThread:66)
for source clusters, uReplicator get broker list by using zookeeper.connect in consumer.properties I saw you set zookeeper.connect=kafka01.xxxxxxxxxxxxxxxx:2181. How many kafka clusters are this zookeeper?
the kafka01.xxxxxxxx:2181 has 2 brokers on the same cluster on different ports( 9092, 9093) and the target cluster(kafka01.
the kafka01.xxxxxxxx:2181 has 2 brokers on the same cluster on different ports( 9092, 9093) and the target cluster(kafka01.
can this be network /security issue contacting the other server. I think we must put the logs in debug Mode to see the error more clearly?
Note : the target cluster is a cloud aws server.
From your log, the worker is definitely working. Have you checked the commit offset in your source zk?
I was trying to look at from the zookeeper shell and it does not show up in the commit offset 442848.
Can you show me the command you ran and the result?
I ran the zookeeper shell command bin/zookeeper-shell xxxxxxxxxxxxxxxxxxx.com:2181 Connecting to xxxxxxxxxxxxx:2181 Welcome to ZooKeeper! JLine support is enabled [zk: xxxxxxxxxxxxxxxxxxxxxx:2181(CONNECTING) 0] WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:2181(CONNECTED) 0] GET consumer/kloak-mirrormaker-test/offsets/oracle-23-MIF-0/0 it does not give any value
Can I have the example configurations so that we can test the lag. I am trying to evaluate the uber replicator?? I have cloned the master branch. where can I configure the controller starter config and want to know how can I configure the src cluster and target cluster for all components. once I understand. I can also build document for U Replicator.
Please let me know??