linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

Getting error while changing replication factor #1346

Closed rahu7624 closed 3 years ago

rahu7624 commented 3 years ago

Hi Team ,

Getting below error while changing replication factor for a Kafka topic.

kafka]# curl -X POST -c cookie "localhost:9090/kafkacruisecontrol/topic_configuration?topic=test14&replication_factor=3" Error processing POST request '/topic_configuration' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1602682552960] (index [1, -1]). Window index (current: 0, oldest: 0).'.[root@f

rahu7624 commented 3 years ago

This is how the state looks for now

MonitorState: {state: TRAINING(0.000% trained), NumValidWindows: (0/0) (NaN%) , NumValidPartitions: 0/0 (0.000%), FlawedPartitions: 0} ExecutorState: {state: NO_TASK_IN_PROGRESS} AnalyzerState: {isProposalReady: false, readyGoals: []} AnomalyDetectorState: {selfHealingEnabled:[], selfHealingDisabled:[BROKER_FAILURE, DISK_FAILURE, GOAL_VIOLATION, METRIC_ANOMALY, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingEnabledRatio:{BROKER_FAILURE=0.0, DISK_FAILURE=0.0, GOAL_VIOLATION=0.0, METRIC_ANOMALY=0.0, TOPIC_ANOMALY=0.0, MAINTENANCE_EVENT=0.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:0.00 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=0.00 milliseconds}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}

efeg commented 3 years ago

Hi @rahu7624 What version of Cruise Control are you running? NotEnoughValidWindowsException could be due to 3 reasons:

  1. Cruise Control instance has just been started upon a first time setup (i.e. a cold start), and it hasn't had enough time to collect samples to generate a cluster model. If this is the case, give your CC instance some time (e.g. 5 - 10 minutes) and try again,
  2. A broker-side issue with producing metrics by Cruise Control metrics reporter to the relevant internal topic (check broker logs for exceptions and see if __CruiseControlMetrics topic is growing in size), or
  3. A Cruise Control-side issue with consuming metrics from the relevant internal topic (e.g. __CruiseControlMetrics)

If the issue is ongoing, could you please see https://github.com/linkedin/cruise-control/issues/310#issuecomment-466252788 for more context, and share any relevant failure logs on either (1) broker- or (2) Cruise Control-side?

rahu7624 commented 3 years ago

Hi Efeg ,

Could you please share commands syntax for cruisecontrol if handy , so that it will be helpful for me.

[root@fedora33-zk01 ~]# curl -X GET "http://localhost:9090/kafkacruisecontrol/load" Error processing GET request '/load' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1603801314539] (index [1, -1]). Window index (current: 0, oldest: 0).'.[root@fedora33-zk01 ~]# [root@fedora33-zk01 ~]# [root@fedora33-zk01 ~]# cat /usr/local/share/cruise-control/config/cruisecontrol.properties | grep -i allow_capacity_estimation [root@fedora33-zk01 ~]# [root@fedora33-zk01 ~]# [root@fedora33-zk01 ~]# [root@fedora33-zk01 ~]# curl -X GET "http://localhost:9090/kafkacruisecontrol/partition_load" Error processing GET request '/partition_load' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1603801438613] (index [1, -1]). Window index (current: 0, oldest: 0).'.[root@

Thanks in advance :)

efeg commented 3 years ago

Hi @rahu7624 Did you check the wiki on Changing topic replication factor through Cruise Control regarding the command for changing replication factor?

Do you still get NotEnoughValidWindowsException? If the cluster has negligible traffic, then this could be the reason for this exception (see https://github.com/linkedin/cruise-control/pull/1369). To resolve this issue, please either set the cruise.control.metrics.reporter.linger.ms config to 5000 on metrics reporter (i.e. broker-side config change that should be applied on each broker) or use the version of metrics reporter generated after https://github.com/linkedin/cruise-control/pull/1369 is merged.

efeg commented 3 years ago

@rahu7624 Closing the issue -- please feel free to reopen if you have further questions.