Open rahu7624 opened 3 years ago
COMPLETENESS_NOT_READY
means that Cruise Control (CC) was unable to collect sufficient samples from Kafka to generate a cluster model on which it will operate to perform goal-based cluster maintenance operations. This could be due to either of the two (1) you have just started CC, so it hasn't had time to collect samples, yet (give it some time and see if CC logs shows that it was able to collect samples a new window is rolled) (2) there is a problem in collecting samples from Kafka. Can you verify that you configured metrics reporter correctly on Kafka-side? Did you follow the quick-start tutorial on CC Github page to setup metrics reporter? Does your metrics reporter topic get any data from Kafka?
Hi Adem ,
Thanks for looking into it , its a test setup having 3 nodes with just one test topic and currently no data flowing in/out. I simply referred quick-start tutorial and configured the same way on all 3 nodes. Kindly refer Kafka side configs for the same and let us know if any changes are required.
[rahul@kafka-0 ~]$ cat /usr/local/share/kafka/config/server.properties | grep -i cruise metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter cruise.control.metrics.topic.auto.create=true cruise.control.metrics.topic.num.partitions=1 cruise.control.metrics.topic.replication.factor=1
Thanks in advance.
However situation is still the same even after 18 hours.
[rahul@kafka-0 ~]$ curl -X GET "http://localhost:9090/kafkacruisecontrol/state" MonitorState: {state: RUNNING(0.000% trained), NumValidWindows: (0/0) (NaN%), NumValidPartitions: 0/0 (0.000%), flawedPartitions: 0} ExecutorState: {state: NO_TASK_IN_PROGRESS} AnalyzerState: {isProposalReady: false, readyGoals: []} AnomalyDetectorState: {selfHealingEnabled:[BROKER_FAILURE, DISK_FAILURE, METRIC_ANOMALY, GOAL_VIOLATION, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingDisabled:[], selfHealingEnabledRatio:{BROKER_FAILURE=1.0, DISK_FAILURE=1.0, METRIC_ANOMALY=1.0, GOAL_VIOLATION=1.0, TOPIC_ANOMALY=1.0, MAINTENANCE_EVENT=1.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[{description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=b5852ac0-9ce9-4721-81bd-a6d89df6e7f5, detectionDate=2021-07-02T08:19:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:19:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=297d6fdd-ee77-4375-b787-f3e8fa39996b, detectionDate=2021-07-02T08:23:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:23:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=b6d70146-0c31-45a5-9d7e-f8f4f9c1c4a1, detectionDate=2021-07-02T08:27:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:27:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=f7344f75-1ccb-4215-8ae7-e0ca9347f2da, detectionDate=2021-07-02T08:35:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:35:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=7d61a3b9-e056-47bc-89a5-7a69fb4e414a, detectionDate=2021-07-02T08:21:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:21:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=d53e99a2-102c-452d-b3fd-c13741a4241c, detectionDate=2021-07-02T08:31:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:31:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=517c91d0-9c10-4c8b-9666-ea95c2ebb490, detectionDate=2021-07-02T08:25:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:25:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=33e18a5e-d05f-46ee-b63e-51dfa3ba44e3, detectionDate=2021-07-02T08:29:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:29:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=0fb9a33e-95c1-442f-8aa3-b8eff0814315, detectionDate=2021-07-02T08:37:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:37:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=980cd90d-0f36-4463-988a-f6da8c68df09, detectionDate=2021-07-02T08:33:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:33:45Z}], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:8.33 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=18.49 hours}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}
[rahul@kafka-0 ~]$
@rahu7624 Do you see any data going into the __CruiseControlMetrics
topic -- i.e. does it grow in size? If not, this is an issue with the Kafka-side configs. Here is a checklist that might help:
./gradlew jar
to generate ./cruise-control-metrics-reporter/build/libs/cruise-control-metrics-reporter-A.B.C.jar
(Where A.B.C
is the version of the Cruise Control) (Note: This project requires Java 11)?./cruise-control-metrics-reporter/build/libs/cruise-control-metrics-reporter-A.B.C.jar
(Where A.B.C
is the version of the Cruise Control) to the correct Kafka server dependency jar folder for each Kafka broker you are running? For Apache Kafka, the folder would be core/build/dependant-libs-SCALA_VERSION/
(for a Kafka source checkout) or libs/
(for a Kafka release download). When you start Kafka, do you see logs generated by Cruise Control Metrics Reporter (you should)?metric.reporters
to
com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter
for each Kafka broker you are running? For Apache Kafka, server properties are located at ./config/server.properties
.SSL
is enabled, did you ensure that the relevant client configurations are properly set for all brokers in
./config/server.properties
? Note that CruiseControlMetricsReporter
takes all configurations for vanilla
KafkaProducer
with a prefix of cruise.control.metrics.reporter.
-- e.g.
cruise.control.metrics.reporter.ssl.truststore.password
.Tried reconfiguring it the way you advised , seems it started collecting some metrices but still giving some errors.
[root@kafka-2 kafka]# systemctl status cruisecontrol -l ● cruisecontrol.service - Zookeeper Loaded: loaded (/etc/systemd/system/cruisecontrol.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2021-07-05 10:46:46 UTC; 44min ago Main PID: 13241 (cc.sh) CGroup: /system.slice/cruisecontrol.service ├─13241 /bin/bash /usr/local/bin/cc.sh └─13243 java -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=./logs -Dlog4j.configurationFile=file:./config/log4j.properties -cp ./cruise-control/build/dependant-libs/:./cruise-control/build/libs/:./cruise-control-metrics-reporter/build/libs/* com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain config/cruisecontrol.properties
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,248] INFO Finished sampling from topic CruiseControlMetrics for partitions [0] in time range [1625484537241,1625484657241]. Collected 526 metrics. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,248] WARN Broker 2 is missing 4/4 topics metrics and 39/39 leader partition metrics. Missing leader topics: [KafkaCruiseControlPartitionMetricSamples, test, KafkaCruiseControlModelTrainingSamples, consumer_offsets]. (com.linkedin.kafka.cruisecontrol.monitor.sampling.holder.BrokerLoad) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,248] WARN Skip generating metric sample for broker 2 because the following required metrics are missing [BROKER_PRODUCE_LOCAL_TIME_MS_MAX, BROKER_PRODUCE_REQUEST_QUEUE_TIME_MS_MEAN, BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MEAN, ALL_TOPIC_PRODUCE_REQUEST_RATE, ALL_TOPIC_MESSAGES_IN_PER_SEC, BROKER_PRODUCE_TOTAL_TIME_MS_MEAN, ALL_TOPIC_FETCH_REQUEST_RATE, BROKER_FOLLOWER_FETCH_REQUEST_RATE, ALL_TOPIC_REPLICATION_BYTES_OUT, BROKER_PRODUCE_TOTAL_TIME_MS_MAX, ALL_TOPIC_REPLICATION_BYTES_IN, BROKER_CONSUMER_FETCH_REQUEST_QUEUE_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_REQUEST_QUEUE_TIME_MS_MAX, BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MAX, ALL_TOPIC_BYTES_IN, BROKER_FOLLOWER_FETCH_TOTAL_TIME_MS_MAX, BROKER_CONSUMER_FETCH_REQUEST_QUEUE_TIME_MS_MEAN, BROKER_PRODUCE_REQUEST_QUEUE_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_TOTAL_TIME_MS_MEAN, BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MEAN, ALL_TOPIC_BYTES_OUT, BROKER_CONSUMER_FETCH_TOTAL_TIME_MS_MEAN, BROKER_REQUEST_QUEUE_SIZE, BROKER_CONSUMER_FETCH_TOTAL_TIME_MS_MAX, BROKER_RESPONSE_QUEUE_SIZE, BROKER_PRODUCE_LOCAL_TIME_MS_MEAN, BROKER_REQUEST_HANDLER_AVG_IDLE_PERCENT, BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_REQUEST_QUEUE_TIME_MS_MEAN]. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingUtils) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO Generated 79(39 skipped by broker {2=39}) partition metric samples and 2(1 skipped) broker metric samples for timestamp 1625484656792. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsProcessor) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO PARTITION Aggregator rolled out 1 new windows, reset 1 windows, current window range [1625484600000, 1625484900000], abandon 237 samples. (com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO Collected 79 partition metric samples for 79 partitions. Total partition assigned: 118. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO BROKER Aggregator rolled out 1 new windows, reset 1 windows, current window range [1625478900000, 1625484900000], abandon 0 samples. (com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,255] INFO Collected 2 broker metric samples for 2 brokers. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher) Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,267] INFO Finished sampling in 26 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager) Jul 05 11:30:58 kafka-2 cc.sh[13241]: [2021-07-05 11:30:58,408] INFO Skipping proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [root@kafka-2 kafka]#
Also it show RF anomaly for cruisecontrol topics.
[root@kafka-2 kafka]# curl 'http://localhost:9090/kafkacruisecontrol/state' MonitorState: {state: RUNNING(11.600% trained), NumValidWindows: (0/1) (0.000%), NumValidPartitions: 79/118 (66.949%), flawedPartitions: 0} ExecutorState: {state: NO_TASK_IN_PROGRESS} AnalyzerState: {isProposalReady: false, readyGoals: [ReplicaDistributionGoal, RackAwareGoal, TopicReplicaDistributionGoal, LeaderReplicaDistributionGoal, ReplicaCapacityGoal]} AnomalyDetectorState: {selfHealingEnabled:[BROKER_FAILURE, DISK_FAILURE, GOAL_VIOLATION, METRIC_ANOMALY, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingDisabled:[], selfHealingEnabledRatio:{BROKER_FAILURE=1.0, DISK_FAILURE=1.0, GOAL_VIOLATION=1.0, METRIC_ANOMALY=1.0, TOPIC_ANOMALY=1.0, MAINTENANCE_EVENT=1.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[{description={Topics with replication factor violations: [{With desired RF 3: [{KafkaCruiseControlModelTrainingSamples(100.00)}, {CruiseControlMetrics(100.00)}, {consumer_offsets(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=e8ee6abe-cfb5-42c7-9daa-1e9293e49692, detectionDate=2021-07-05T11:20:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:20:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}, {CruiseControlMetrics(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=cb615220-3a97-4670-a320-5a7e66612879, detectionDate=2021-07-05T11:16:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:16:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{CruiseControlMetrics(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}, {consumer_offsets(100.00)}]}]}, anomalyId=35dc9570-8750-4baa-a2f3-4c2c641b51e0, detectionDate=2021-07-05T11:32:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:32:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{CruiseControlMetrics(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}, {consumer_offsets(100.00)}]}]}, anomalyId=3320092c-2c2b-471e-949f-f7137b580de4, detectionDate=2021-07-05T11:28:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:28:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{CruiseControlMetrics(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}, {consumer_offsets(100.00)}]}]}, anomalyId=07ac8022-e5cf-4d1c-9a99-2f47cfa8b476, detectionDate=2021-07-05T11:30:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:30:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__consumer_offsets(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}, {CruiseControlMetrics(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=28339b49-77db-4ed3-9ba2-31920954b398, detectionDate=2021-07-05T11:34:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:34:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{CruiseControlMetrics(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}, {consumer_offsets(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=05bda5cc-693e-4072-a91b-05294bbb5e58, detectionDate=2021-07-05T11:22:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:22:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{KafkaCruiseControlModelTrainingSamples(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}, {CruiseControlMetrics(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=334304d1-1181-4702-827e-ff37a91cd436, detectionDate=2021-07-05T11:18:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:18:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{consumer_offsets(100.00)}, {CruiseControlMetrics(100.00)}, {KafkaCruiseControlPartitionMetricSamples(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}]}]}, anomalyId=1be80a1d-5124-494c-81e3-ed4c038991aa, detectionDate=2021-07-05T11:24:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:24:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{KafkaCruiseControlPartitionMetricSamples(100.00)}, {CruiseControlMetrics(100.00)}, {KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=b82cd436-d121-4bb1-ac12-1d907939c92a, detectionDate=2021-07-05T11:26:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:26:04Z}], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:8.29 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=47.01 minutes}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}
[root@kafka-2 kafka]#
WARN Broker 2 is missing 4/4 topics metrics and 39/39 leader partition metrics. Missing leader topics: [KafkaCruiseControlPartitionMetricSamples, test, KafkaCruiseControlModelTrainingSamples, __consumer_offsets].
and then
INFO Generated 79(39 skipped by broker {2=39}) partition metric samples and 2(1 skipped) broker metric samples for timestamp 1625484656792.
implies that broker 2 was not configured properly. If broker 2 is configured later, then eventually CC will be able to collect samples from all brokers and will roll out a window -- i.e. MonitorState
will show NumValidWindows: (1/1)
.
Also it show RF anomaly for cruisecontrol topics.
This is independent of the issue we discussed above. It says that "desired replication factor" config is set to 3, but the listed topics have an RF different from the desired RF. You can set the desired replication factor in a cluster using self.healing.target.topic.replication.factor
config.
Hi Team ,
We are getting below errors while checking cruisecontrol status , can you please check and suggest.
[root@kafka-0 ~]# systemctl status cruisecontrol -l ● cruisecontrol.service - Zookeeper Loaded: loaded (/etc/systemd/system/cruisecontrol.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2021-07-01 14:08:30 UTC; 3min 59s ago Main PID: 29352 (cc.sh) CGroup: /system.slice/cruisecontrol.service ├─29352 /bin/bash /usr/local/bin/cc.sh └─29354 java -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=./logs -Dlog4j.configurationFile=file:./config/log4j.properties -cp ./cruise-control/build/dependant-libs/:./cruise-control/build/libs/:./cruise-control-metrics-reporter/build/libs/* com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain config/cruisecontrol.properties
Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,861] WARN Skipping goal violation detection for ReplicaCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector) Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,861] WARN Skipping goal violation detection for DiskCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector) Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,861] WARN Skipping goal violation detection for NetworkInboundCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector) Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,862] WARN Skipping goal violation detection for NetworkOutboundCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector) Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,862] WARN Skipping goal violation detection for CpuCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector) Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,398] INFO Start to detect topic replication factor anomaly. (com.linkedin.kafka.cruisecontrol.detector.TopicAnomalyFinder) Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,399] WARN TOPIC_ANOMALY detected {Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}. Self healing start time 2021-07-01T14:11:45Z. (com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier) Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,400] WARN Self-healing has been triggered. (com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier) Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,472] WARN Skipping TOPIC_ANOMALY fix because load completeness requirement is not met for goals. (com.linkedin.kafka.cruisecontrol.detector.AnomalyDetectorManager) Jul 01 14:12:11 kafka-0 cc.sh[29352]: [2021-07-01 14:12:11,598] INFO Skipping proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer) [root@kafka-0 ~]#
[root@kafka-0 kafka]# curl 'http://localhost:9090/kafkacruisecontrol/state' MonitorState: {state: RUNNING(0.000% trained), NumValidWindows: (0/0) (NaN%), NumValidPartitions: 0/0 (0.000%), flawedPartitions: 0} ExecutorState: {state: NO_TASK_IN_PROGRESS} AnalyzerState: {isProposalReady: false, readyGoals: []} AnomalyDetectorState: {selfHealingEnabled:[BROKER_FAILURE, DISK_FAILURE, METRIC_ANOMALY, GOAL_VIOLATION, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingDisabled:[], selfHealingEnabledRatio:{BROKER_FAILURE=1.0, DISK_FAILURE=1.0, METRIC_ANOMALY=1.0, GOAL_VIOLATION=1.0, TOPIC_ANOMALY=1.0, MAINTENANCE_EVENT=1.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[{description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=c3044efe-1176-461e-bd21-9b16418bc815, detectionDate=2021-07-01T14:11:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:11:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=20958eec-b7fa-4fc4-8c6a-38f000a20b09, detectionDate=2021-07-01T14:09:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:09:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=a61584a9-0d44-472c-b2b1-b8740a3c6ced, detectionDate=2021-07-01T14:13:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:13:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=2e5612d6-0c3c-4e38-a478-ca06b7eeb265, detectionDate=2021-07-01T14:15:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:15:45Z}], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:8.88 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=6.31 minutes}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}
[root@kafka-0 kafka]#