Open dungnt081191 opened 5 years ago
could you please paste some more worker log?
hi @Technoboy-
yes okay , but lots of log during that time . as i said above , the duplicate message happen during scale down the number of worker pod ( while terminating ) then here is the log after scale down pod 2 -> 1
log before scale down
[2019-10-29 02:35:13,758] INFO msg: 79473479-d226-4447-b8f9-f07a475f0a65 handling task completed, results:true, at: 1572316513758, took:306 (org.apache.helix.messaging.handling.HelixTask:170)
[2019-10-29 02:35:13,760] INFO Disconnecting from 10.111.176.23:9092 (kafka.producer.SyncProducer:66)
[2019-10-29 02:35:13,761] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Try find leader for topic: ###########5, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 02:35:13,761] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Try find leader for topic: ###########4, partition:2 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 02:35:13,761] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Try find leader for topic: ###########3, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 02:35:13,762] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Try find leader for topic: ###########2, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 02:35:13,762] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Try find leader for topic: ###########, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 02:35:13,762] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Find leader for partitions finished, took: 169 ms (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
2019-10-29T02:35:13.763+0000: Total time for which application threads were stopped: 0.0002628 seconds, Stopping threads took: 0.0000841 seconds
2019-10-29T02:35:13.763+0000: Total time for which application threads were stopped: 0.0002773 seconds, Stopping threads took: 0.0000482 seconds
2019-10-29T02:35:13.766+0000: Total time for which application threads were stopped: 0.0001978 seconds, Stopping threads took: 0.0000240 seconds
2019-10-29T02:35:13.767+0000: Total time for which application threads were stopped: 0.0002218 seconds, Stopping threads took: 0.0000408 seconds
2019-10-29T02:35:13.767+0000: Total time for which application threads were stopped: 0.0002151 seconds, Stopping threads took: 0.0000431 seconds
[2019-10-29 02:35:13,787] INFO Subcribing to path:/uReplicatorDev/INSTANCES/HelixMirrorMaker-1572271233140/MESSAGES took:79 (org.apache.helix.manager.zk.CallbackHandler:356)
2019-10-29T02:35:13.809+0000: Total time for which application threads were stopped: 0.0009120 seconds, Stopping threads took: 0.0000556 seconds
[2019-10-29 02:35:13,827] INFO No Messages to process (org.apache.helix.messaging.handling.HelixTaskExecutor:619)
[2019-10-29 02:35:13,827] INFO 20 END:INVOKE /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572271233140/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor Took: 119ms (org.apache.helix.manager.zk.CallbackHandler:266)
[2019-10-29 02:35:13,852] INFO [CompactConsumerFetcherManager-1572271233359] Fetcher Thread for topic partitions: ArrayBuffer([###########4-2, InitialOffset -1] , [###########3-0, InitialOffset 277] , [###########5-0, InitialOffset 91470] , [###########2-0, InitialOffset -1] , [###########-0, InitialOffset 6621] ) is CompactConsumerFetcherThread-kloak-mirrormaker-dc-test2_kloakmms01-dc-0-0 (kafka.mirrormaker.CompactConsumerFetcherManager:66)
[2019-10-29 02:35:13,852] INFO [CompactConsumerFetcherManager-1572271233359] Added fetcher for partitions ArrayBuffer([###########4-2, initOffset -1 to broker BrokerEndPoint(0,10.111.176.23,9092)] , [###########3-0, initOffset 277 to broker BrokerEndPoint(0,10.111.176.23,9092)] , [###########5-0, initOffset 91470 to broker BrokerEndPoint(0,10.111.176.23,9092)] , [###########2-0, initOffset -1 to broker BrokerEndPoint(0,10.111.176.23,9092)] , [###########-0, initOffset 6621 to broker BrokerEndPoint(0,10.111.176.23,9092)] ) (kafka.mirrormaker.CompactConsumerFetcherManager:66)
2019-10-29T02:35:13.894+0000: Total time for which application threads were stopped: 0.0004547 seconds, Stopping threads took: 0.0000734 seconds
[2019-10-29 02:35:13,914] INFO 20 START:INVOKE /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572271233140/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor (org.apache.helix.manager.zk.CallbackHandler:194)
[2019-10-29 02:35:13,914] INFO HelixMirrorMaker-1572271233140 subscribes child-change. path: /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572271233140/MESSAGES, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4738a206 (org.apache.helix.manager.zk.CallbackHandler:275)
[2019-10-29 02:35:13,993] INFO Subcribing to path:/uReplicatorDev/INSTANCES/HelixMirrorMaker-1572271233140/MESSAGES took:79 (org.apache.helix.manager.zk.CallbackHandler:356)
[2019-10-29 02:35:14,034] INFO No Messages to process (org.apache.helix.messaging.handling.HelixTaskExecutor:619)
[2019-10-29 02:35:14,034] INFO 20 END:INVOKE /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572271233140/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor Took: 120ms (org.apache.helix.manager.zk.CallbackHandler:266)
2019-10-29T02:35:14.895+0000: Total time for which application threads were stopped: 0.0004136 seconds, Stopping threads took: 0.0001164 seconds
2019-10-29T02:35:15.895+0000: Total time for which application threads were stopped: 0.0003663 seconds, Stopping threads took: 0.0000618 seconds
2019-10-29T02:35:18.896+0000: Total time for which application threads were stopped: 0.0003273 seconds, Stopping threads took: 0.0000600 seconds
2019-10-29T02:35:19.462+0000: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 4194304 bytes, new threshold 15 (max 15)
- age 1: 53688 bytes, 53688 total
- age 2: 51408 bytes, 105096 total
- age 3: 48800 bytes, 153896 total
- age 4: 46680 bytes, 200576 total
- age 5: 47976 bytes, 248552 total
- age 6: 48464 bytes, 297016 total
- age 7: 48192 bytes, 345208 total
- age 8: 243256 bytes, 588464 total
- age 9: 271504 bytes, 859968 total
- age 10: 17040 bytes, 877008 total
- age 11: 17016 bytes, 894024 total
- age 12: 17688 bytes, 911712 total
- age 13: 22096 bytes, 933808 total
- age 14: 16944 bytes, 950752 total
- age 15: 17064 bytes, 967816 total
45287.533: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 2120, predicted base time: 8.23 ms, remaining time: 191.77 ms, target pause time: 200.00 ms]
45287.533: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: 1 regions, predicted young region time: 0.52 ms]
45287.533: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, survivors: 1 regions, old: 0 regions, predicted pause time: 8.75 ms, target pause time: 200.00 ms]
2019-10-29T02:35:19.468+0000: [SoftReference, 0 refs, 0.0009323 secs]2019-10-29T02:35:19.469+0000: [WeakReference, 0 refs, 0.0002444 secs]2019-10-29T02:35:19.469+0000: [FinalReference, 0 refs, 0.0002698 secs]2019-10-29T02:35:19.469+0000: [PhantomReference, 0 refs, 0 refs, 0.0003146 secs]2019-10-29T02:35:19.469+0000: [JNI Weak Reference, 0.0000225 secs], 0.0075687 secs]
[Parallel Time: 4.9 ms, GC Workers: 2]
log of pod scale down
- age 3: 17808 bytes, 83344 total
- age 4: 18048 bytes, 101392 total
- age 5: 18344 bytes, 119736 total
- age 6: 17296 bytes, 137032 total
- age 7: 17864 bytes, 154896 total
- age 8: 17952 bytes, 172848 total
- age 9: 17760 bytes, 190608 total
- age 10: 21280 bytes, 211888 total
- age 11: 17232 bytes, 229120 total
- age 12: 17088 bytes, 246208 total
- age 13: 17472 bytes, 263680 total
- age 14: 17592 bytes, 281272 total
- age 15: 17352 bytes, 298624 total
46039.679: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 1857, predicted base time: 9.85 ms, remaining time: 190.15 ms, target pause time: 200.00 ms]
46039.679: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: 1 regions, predicted young region time: 0.19 ms]
46039.679: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, survivors: 1 regions, old: 0 regions, predicted pause time: 10.04 ms, target pause time: 200.00 ms]
2019-10-29T02:51:41.671+0000: [SoftReference, 0 refs, 0.0001568 secs]2019-10-29T02:51:41.671+0000: [WeakReference, 0 refs, 0.0001058 secs]2019-10-29T02:51:41.671+0000: [FinalReference, 0 refs, 0.0000782 secs]2019-10-29T02:51:41.671+0000: [PhantomReference, 0 refs, 0 refs, 0.0001795 secs]2019-10-29T02:51:41.671+0000: [JNI Weak Reference, 0.0000160 secs], 0.0046632 secs]
[Parallel Time: 3.6 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 46039679.5, Avg: 46039679.6, Max: 46039679.6, Diff: 0.1]
[Ext Root Scanning (ms): Min: 1.0, Avg: 1.1, Max: 1.2, Diff: 0.1, Sum: 2.2]
[Update RS (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.0, Sum: 2.8]
[Processed Buffers: Min: 3, Avg: 5.5, Max: 8, Diff: 5, Sum: 11]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 0.9, Avg: 0.9, Max: 0.9, Diff: 0.0, Sum: 1.9]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 3.4, Avg: 3.4, Max: 3.5, Diff: 0.1, Sum: 6.9]
[GC Worker End (ms): Min: 46039683.0, Avg: 46039683.0, Max: 46039683.0, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 1.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.7 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.0 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 8192.0K(8192.0K)->0.0B(8192.0K) Survivors: 8192.0K->8192.0K Heap: 126.4M(144.0M)->118.6M(144.0M)]
[Times: user=0.01 sys=0.00, real=0.01 secs]
2019-10-29T02:51:41.672+0000: Total time for which application threads were stopped: 0.0052173 seconds, Stopping threads took: 0.0000581 seconds
rpc error: code = Unknown desc = Error: No such container: 0b6056d43018c07310ccc8a62a644b6f2badbf5519c02b09b181f8510b23ee43%
Can you provide all the logs (at least after the beginning of shutdown) on the worker that being terminated? Very likely it's not shutdown gracefully and commit the offset.
hi @xhl1988
The log of worker above is from : before action Scale down , and log during scale down of worker. Below is log of controller .
how to check it shutdown graceful ? .
Cause of i write a kubernetes deployment file and scale up / down by update scales
field . from 1 -> 4, 4 -> 3 ...
Have i send api request shutdown or kill -9 process in pod before scale down ?
my k8s deployment file :
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
name: dc-nonprod-ureplicator-controller
namespace: ureplicator
labels:
app: dc-nonprod-ureplicator
component: controller
spec:
replicas: 1
progressDeadlineSeconds: 120
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # how many pods we can add at a time
maxUnavailable: 0 # maxUnavailable define how many pods can be unavailable # during the rolling update
selector:
matchLabels:
app: dc-nonprod-ureplicator
component: controller
template:
metadata:
labels:
app: dc-nonprod-ureplicator
component: controller
spec:
containers:
- name: dc-nonprod-ureplicator-controller
image: dungnt081191/ureplicator:0.32
imagePullPolicy: Always
env:
- name: SERVICE_TYPE
value: "controller"
envFrom:
- configMapRef:
name: dc-nonprod-ureplicator-envs
ports:
- name: api-port
containerPort: 9000
resources:
requests:
cpu: 500m
memory: 1500Mi
# limits:
# cpu: 1000m
# memory: 3000Mi
volumeMounts:
- name: ureplicator-key
mountPath: "/uReplicator/bin/kafka-truststore.jks"
subPath: kafka-truststore.jks
readOnly: true
- name: ureplicator-key
mountPath: "/uReplicator/bin/manager.jks"
subPath: manager.jks
readOnly: true
restartPolicy: Always
volumes:
- name: ureplicator-key
secret:
secretName: ureplicator-key
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: dc-nonprod-ureplicator-worker
namespace: ureplicator
labels:
app: dc-nonprod-ureplicator
component: worker
spec:
replicas: 1
selector:
matchLabels:
app: dc-nonprod-ureplicator
component: worker
template:
metadata:
labels:
app: dc-nonprod-ureplicator
component: worker
spec:
containers:
- name: dc-nonprod-ureplicator-worker
image: dungnt081191/ureplicator:0.32
env:
- name: SERVICE_TYPE
value: "worker"
envFrom:
- configMapRef:
name: dc-nonprod-ureplicator-envs
resources:
requests:
cpu: 100m
memory: 1000Mi
# limits:
# cpu: 1000m
# memory: 3000Mi
volumeMounts:
- name: ureplicator-key
mountPath: "/uReplicator/bin/kafka-truststore.jks"
subPath: kafka-truststore.jks
readOnly: true
- name: ureplicator-key
mountPath: "/uReplicator/bin/manager.jks"
subPath: manager.jks
readOnly: true
volumes:
- name: ureplicator-key
secret:
secretName: ureplicator-key
Log of Controller
2019-10-29 03:24:34,700] INFO added 1 new topics to topic list in zk child change (com.uber.stream.kafka.mirrormaker.common.core.KafkaBrokerTopicObserver:141)
2019-10-29T03:24:41.725+0000: Total time for which application threads were stopped: 0.0008572 seconds, Stopping threads took: 0.0001218 seconds
[2019-10-29 03:24:41,724] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:24:43,937] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:24:43,937] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:24:51,674] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83)
[2019-10-29 03:24:51,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:24:53,894] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:24:53,894] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:24:55,881] WARN Got exception when removing metrics for externalView.topicPartitions.HelixMirrorMaker-1572270513534.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319)
java.lang.IllegalStateException: Not initialized yet
at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-10-29T03:24:55.882+0000: Total time for which application threads were stopped: 0.0006868 seconds, Stopping threads took: 0.0000676 seconds
[2019-10-29 03:24:55,881] WARN Got exception when removing metrics for externalView.topicPartitions.HelixMirrorMaker-1572271463233.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319)
java.lang.IllegalStateException: Not initialized yet
at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-10-29 03:24:55,882] WARN Got exception when removing metrics for externalView.topicPartitions.HelixMirrorMaker-1572270513653.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319)
java.lang.IllegalStateException: Not initialized yet
at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-10-29 03:25:01,714] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:25:03,892] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:25:03,893] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:25:11,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:25:13,901] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:25:13,901] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:25:21,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:25:23,910] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:25:23,910] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:25:31,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:25:33,894] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:25:33,895] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:25:41,714] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:25:43,897] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:25:43,898] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:25:51,673] INFO Trying to run the validation job (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:83)
[2019-10-29 03:25:51,712] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:25:53,888] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:25:53,888] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:25:55,873] WARN Got exception when removing metrics for externalView.topicPartitions.HelixMirrorMaker-1572270513534.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319)
java.lang.IllegalStateException: Not initialized yet
at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-10-29 03:25:55,873] WARN Got exception when removing metrics for externalView.topicPartitions.HelixMirrorMaker-1572271463233.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319)
java.lang.IllegalStateException: Not initialized yet
at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-10-29 03:25:55,873] WARN Got exception when removing metrics for externalView.topicPartitions.HelixMirrorMaker-1572270513653.totalNumber (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:319)
java.lang.IllegalStateException: Not initialized yet
at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.checkState(KafkaUReplicatorMetricsReporter.java:207)
at com.uber.stream.ureplicator.common.KafkaUReplicatorMetricsReporter.get(KafkaUReplicatorMetricsReporter.java:178)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.updatePerWorkerEVMetrics(ValidationManager.java:317)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager.validateExternalView(ValidationManager.java:210)
at com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager$1.run(ValidationManager.java:84)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-10-29 03:26:01,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:26:03,895] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:26:03,896] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:26:11,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:26:13,899] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:26:13,899] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:26:21,714] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:26:23,894] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:26:23,894] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:26:31,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:26:33,904] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:26:33,904] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
[2019-10-29 03:26:41,713] INFO Trying to run topic whitelisting job (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:124)
[2019-10-29 03:26:43,898] INFO BlacklistedTopics=[] and ExcludingPattern=__consumer_offsets (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:171)
[2019-10-29 03:26:43,899] INFO Trying to whitelist topics: [] (com.uber.stream.kafka.mirrormaker.controller.core.AutoTopicWhitelistingManager:132)
search worker log, find key words like "Start clean shutdown" or "Flushing last batches and commit offsets" which mean gracefully shutdown。under gracefully shutdown, message offset will commit。otherwise, you may kill the pod with incorrect manner
@Technoboy- sorry, how to stop correctly pod to commit offset before shuting down this pod ?. Have i include another command in startController/Worker sh command ?
reference : https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html you'd better kill the worker process using shell before killing the pod.
@Technoboy-
not working : i tried this one :
#!/bin/sh -e
PID=$(pgrep -f "Dapp_name=uReplicator-Worker" || true)
if [ -n "$PID" ]; then
kill -9 ${PID}
fi
and log pod when prestop run from kubernetes :
[2019-10-29 06:59:50,088] INFO Fetching metadata from broker BrokerEndPoint(0,10.111.176.23,9092) with correlation id 5 for 3 topic(s) Set(##########, ###############################, #################) (kafka.client.ClientUtils$:66)
[2019-10-29 06:59:50,117] INFO 20 START:INVOKE /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572332239423/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor (org.apache.helix.manager.zk.CallbackHandler:194)
[2019-10-29 06:59:50,118] INFO HelixMirrorMaker-1572332239423 subscribes child-change. path: /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572332239423/MESSAGES, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4738a206 (org.apache.helix.manager.zk.CallbackHandler:275)
[2019-10-29 06:59:50,132] INFO Connected to 10.111.176.23:9092 for producing (kafka.producer.SyncProducer:66)
[2019-10-29 06:59:50,172] INFO Disconnecting from 10.111.176.23:9092 (kafka.producer.SyncProducer:66)
[2019-10-29 06:59:50,175] INFO [kloak-mirrormaker--dc-test2_kloakmms01--dc-leader-finder-thread]: Try find leader for topic: ##########, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 06:59:50,177] INFO [kloak-mirrormaker--dc-test2_kloakmms01--dc-leader-finder-thread]: Try find leader for topic: ###############################, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 06:59:50,177] INFO [kloak-mirrormaker--dc-test2_kloakmms01--dc-leader-finder-thread]: Try find leader for topic: #################, partition:0 (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 06:59:50,178] INFO [kloak-mirrormaker--dc-test2_kloakmms01--dc-leader-finder-thread]: Find leader for partitions finished, took: 176 ms (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 06:59:50,201] INFO Subcribing to path:/uReplicatorDev/INSTANCES/HelixMirrorMaker-1572332239423/MESSAGES took:83 (org.apache.helix.manager.zk.CallbackHandler:356)
[2019-10-29 06:59:50,241] INFO No Messages to process (org.apache.helix.messaging.handling.HelixTaskExecutor:619)
[2019-10-29 06:59:50,241] INFO 20 END:INVOKE /uReplicatorDev/INSTANCES/HelixMirrorMaker-1572332239423/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor Took: 124ms (org.apache.helix.manager.zk.CallbackHandler:266)
[2019-10-29 06:59:50,303] INFO [CompactConsumerFetcherManager-1572332239637] Fetcher Thread for topic partitions: ArrayBuffer([##########-0, InitialOffset 679] , [#################-0, InitialOffset -1] , [###############################-0, InitialOffset -1] ) is CompactConsumerFetcherThread-kloak-mirrormaker--dc-test2_kloakmms01--dc-0-0 (kafka.mirrormaker.CompactConsumerFetcherManager:66)
[2019-10-29 06:59:50,311] INFO [CompactConsumerFetcherManager-1572332239637] Added fetcher for partitions ArrayBuffer([##########-0, initOffset 679 to broker BrokerEndPoint(0,10.111.176.23,9092)] , [#################-0, initOffset -1 to broker BrokerEndPoint(0,10.111.176.23,9092)] , [###############################-0, initOffset -1 to broker BrokerEndPoint(0,10.111.176.23,9092)] ) (kafka.mirrormaker.CompactConsumerFetcherManager:66)
2019-10-29T06:59:50.849+0000: Total time for which application threads were stopped: 0.0003164 seconds, Stopping threads took: 0.0000538 seconds
2019-10-29T06:59:51.850+0000: Total time for which application threads were stopped: 0.0003843 seconds, Stopping threads took: 0.0000568 seconds
2019-10-29T06:59:52.851+0000: Total time for which application threads were stopped: 0.0007836 seconds, Stopping threads took: 0.0000786 seconds
2019-10-29T06:59:53.851+0000: Total time for which application threads were stopped: 0.0002985 seconds, Stopping threads took: 0.0000498 seconds
2019-10-29T06:59:54.851+0000: Total time for which application threads were stopped: 0.0002572 seconds, Stopping threads took: 0.0000433 seconds
2019-10-29T06:59:55.852+0000: Total time for which application threads were stopped: 0.0003113 seconds, Stopping threads took: 0.0000635 seconds
2019-10-29T06:59:56.852+0000: Total time for which application threads were stopped: 0.0003098 seconds, Stopping threads took: 0.0000564 seconds
2019-10-29T06:59:57.853+0000: Total time for which application threads were stopped: 0.0004281 seconds, Stopping threads took: 0.0000470 seconds
2019-10-29T06:59:58.853+0000: Total time for which application threads were stopped: 0.0003280 seconds, Stopping threads took: 0.0001100 seconds
2019-10-29T06:59:59.854+0000: Total time for which application threads were stopped: 0.0003383 seconds, Stopping threads took: 0.0000426 seconds
2019-10-29T07:00:00.854+0000: Total time for which application threads were stopped: 0.0002815 seconds, Stopping threads took: 0.0000482 seconds
2019-10-29T07:00:01.855+0000: Total time for which application threads were stopped: 0.0002605 seconds, Stopping threads took: 0.0000483 seconds
2019-10-29T07:00:02.855+0000: Total time for which application threads were stopped: 0.0003141 seconds, Stopping threads took: 0.0001119 seconds
2019-10-29T07:00:03.856+0000: Total time for which application threads were stopped: 0.0004248 seconds, Stopping threads took: 0.0001298 seconds
2019-10-29T07:00:05.857+0000: Total time for which application threads were stopped: 0.0003363 seconds, Stopping threads took: 0.0000596 seconds
2019-10-29T07:00:07.857+0000: Total time for which application threads were stopped: 0.0004193 seconds, Stopping threads took: 0.0000611 seconds
2019-10-29T07:00:08.858+0000: Total time for which application threads were stopped: 0.0003288 seconds, Stopping threads took: 0.0000555 seconds
2019-10-29T07:00:09.858+0000: Total time for which application threads were stopped: 0.0003776 seconds, Stopping threads took: 0.0001157 seconds
2019-10-29T07:00:10.859+0000: Total time for which application threads were stopped: 0.0003137 seconds, Stopping threads took: 0.0000682 seconds
2019-10-29T07:00:11.859+0000: Total time for which application threads were stopped: 0.0003409 seconds, Stopping threads took: 0.0001035 seconds
2019-10-29T07:00:12.860+0000: Total time for which application threads were stopped: 0.0003121 seconds, Stopping threads took: 0.0000479 seconds
2019-10-29T07:00:13.860+0000: Total time for which application threads were stopped: 0.0002412 seconds, Stopping threads took: 0.0000454 seconds
2019-10-29T07:00:14.861+0000: Total time for which application threads were stopped: 0.0003741 seconds, Stopping threads took: 0.0000601 seconds
2019-10-29T07:00:17.862+0000: Total time for which application threads were stopped: 0.0003564 seconds, Stopping threads took: 0.0001096 seconds
2019-10-29T07:00:18.862+0000: Total time for which application threads were stopped: 0.0004468 seconds, Stopping threads took: 0.0000610 seconds
2019-10-29T07:00:20.863+0000: Total time for which application threads were stopped: 0.0002459 seconds, Stopping threads took: 0.0000561 seconds
2019-10-29T07:00:21.863+0000: Total time for which application threads were stopped: 0.0003930 seconds, Stopping threads took: 0.0001232 seconds
/entrypoint.sh: line 31: 13 Killed ./start-worker.sh --consumer.config /uReplicator/config/consumer.properties --producer.config /uReplicator/config/producer.properties --helix.config /uReplicator/config/helix.properties --topic.mappings /uReplicator/config/topicmapping.properties
rpc error: code = Unknown desc = Error: No such container: 2a6aa6c3c227b92a8c246e8a319a1a5f4997c5748128307016617b6cca46d905
if the shell found the worker process and kill it , you can see the key words i said above. so i think the shell does't found the worker process correctly. did you specify the startup param app_name?
@Technoboy- no no , i did not. my worker start is
if [ "${SERVICE_TYPE}" == "controller" ] ; then
./start-controller.sh \
-port 9000 \
-helixClusterName "${HELIX_CLUSTER_NAME}" \
-refreshTimeInSeconds 10 \
-enableAutoWhitelist true \
-srcKafkaZkPath "${SRC_ZK_CONNECT}" \
-destKafkaZkPath "${DST_ZK_CONNECT}" \
-zookeeper "${SRC_ZK_CONNECT}"
elif [ "${SERVICE_TYPE}" == "worker" ] ; then
./start-worker.sh \
--consumer.config /uReplicator/config/consumer.properties \
--producer.config /uReplicator/config/producer.properties \
--helix.config /uReplicator/config/helix.properties \
--topic.mappings /uReplicator/config/topicmapping.properties
fi
with no app_name specify
my kill process script is
#!/bin/sh -e
PID=$(pgrep -f "Dapp_name=uReplicator-Worker" || true)
if [ -n "$PID" ]; then
kill -9 ${PID}
fi
If manually check processs id in pod : here is the result
root@ureplicator-worker-7fdffbb6f5-nm5wj:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 07:25 ? 00:00:00 /bin/bash -ex /entrypoint.sh
root 15 1 2 07:25 ? 00:00:19 /usr/local/openjdk-8/bin/java -XX:ParallelGCThreads=2 -Dapp_name=uReplicator-Worker -cp uReplicator-Distribution/target/uReplicator-Distribution-0.1-SN
root 117 0 0 07:34 ? 00:00:00 bash
root 150 0 0 07:39 ? 00:00:00 bash
root 158 150 0 07:39 ? 00:00:00 ps -ef
And i kill the java process with Dapp_name=uReplicator-Worker
@Technoboy-
hey bro , the problem is : kill process or kill -9 process .
I did this way with kill -9
I've tried with 'kill processId' and here is result : have commit offset log? And It's Work !!!
2019-10-29T07:45:23.181+0000: Total time for which application threads were stopped: 0.0002563 seconds, Stopping threads took: 0.0000400 seconds
2019-10-29T07:45:40.183+0000: Total time for which application threads were stopped: 0.0001666 seconds, Stopping threads took: 0.0000360 seconds
[2019-10-29 07:45:45,083] INFO Start clean shutdown. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:45,086] INFO Shutting down consumer thread. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:45,089] INFO [mirrormaker-thread-HelixMirrorMaker-1572335099430] mirrormaker-thread-HelixMirrorMaker-1572335099430 shutting down (kafka.mirrormaker.WorkerInstance$MirrorMakerThread:66)
2019-10-29T07:45:45.184+0000: Total time for which application threads were stopped: 0.0001369 seconds, Stopping threads took: 0.0000343 seconds
2019-10-29T07:45:51.185+0000: Total time for which application threads were stopped: 0.0001361 seconds, Stopping threads took: 0.0000360 seconds
[2019-10-29 07:45:51,811] INFO [mirrormaker-thread-HelixMirrorMaker-1572335099430] Mirror maker thread stopped (kafka.mirrormaker.WorkerInstance$MirrorMakerThread:66)
[2019-10-29 07:45:51,811] INFO [mirrormaker-thread-HelixMirrorMaker-1572335099430] Mirror maker thread shutdown complete (kafka.mirrormaker.WorkerInstance$MirrorMakerThread:66)
[2019-10-29 07:45:51,811] INFO Flushing last batches and commit offsets. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,812] INFO Flushing producer. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,816] INFO Committing offsets. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,823] INFO Shutting down consumer connectors. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,823] INFO Connector is now shutting down! (kafka.mirrormaker.KafkaConnector:66)
[2019-10-29 07:45:51,838] INFO [CompactConsumerFetcherManager-1572335099655] Stopping leader finder thread (kafka.mirrormaker.CompactConsumerFetcherManager:66)
[2019-10-29 07:45:51,839] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Shutting down (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 07:45:51,839] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Stopped (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 07:45:51,839] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Shutdown completed (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 07:45:51,840] INFO [CompactConsumerFetcherManager-1572335099655] Stopping all fetchers (kafka.mirrormaker.CompactConsumerFetcherManager:66)
[2019-10-29 07:45:51,844] INFO [CompactConsumerFetcherManager-1572335099655] All connections stopped (kafka.mirrormaker.CompactConsumerFetcherManager:66)
kill process.
yes ,perfect .
@Technoboy- in another case, should i use 2 Controller ? , or only 1 Controller
in non-federate mode, more than 1 controller can keep your system HA . if one controller downs, another will take the leader ship
@Technoboy- but i meeting this trouble , again - duplicate message when have 2 Controller.
I already stop graceful Controller like i did with Worker kill processId
worker is responsible for sync message. controller is responsible for managing worker. so duplicate message can only occur in worker.
hmmmm. it happen when start Controller to replicate autowhitelist = true . So crazy. don't need to wait to shutting down Controller
@Technoboy-
Do you know what happen when we stop both Controller and Worker in the same time .
All pod have graceful stop .
Do the topic will be replicate with the correct number of message or not ?
yes, it is.
@Technoboy- nope , after restart Kubernetes node , Controller is not working until manual delete Controller pod.
Hi everyone , I'm implementing uReplicator with latest master branch ( commit 70ac3404bb4517756c6813d3962b77edc7700019 ). As the another commit on master , after Worker restart, always have lots of message duplicate in Destination topic .
Here is overview my step :
step1: Build the latest code
step2: build Docker image with this Dockerfile
Step3: prepare 2 topic:
step4: apply deployment on Kubernetes with config :
clusters.properties
consumer.properties
dstzk.properties
helix.properties : disable instanceId because of i'm using more than 1 Worker
producer.properties
Step4: check number of message on both server
Src & Dst topic : 600 message
Step5 : Delete 1 worker POD kubernetes by : kubectl delete pod .... ( no --force command )
result : 1 - old POD is Terminating. Src & Dst keep sync up on time : 750 -750 on both topic
2 - a few second later: new POD is UP Src topic : keep rising to 900 -> 95- -> 1000 Dst topic : stay at 750 message
3 - a minutes later : Src topic : 1000 message Dst topic : more than 1000 message : sometime 1200 message , sometime 1100-1300...etc ( the result is different in every test )
Note : if no Worker is down. The replication result is perfect . But in this case in everytime is test , the result is different and uReplicator is not stable. @yangy0000 @Pawka @DAddYE @hparra @chirayuk @dentafrice CC to anyone Who contributed to this application . Please let me know/correct me if something went wrong .