Closed bpux closed 1 year ago
We also hit this issue with a 2.5.86
CC version and Kafka 2.7.1.
Not for self healing but a regular 'add_broker' operation as broker_id 41 was dead.
CC was crashing with the timeout exception above.
I initially tried to patch the code by increasing the timeout from 30 sec to 3 min. No success.
Then I did this hack
if(brokers!=41){// 41 is the stopped broker
setThrottledRateIfUnset(broker);
}
And rebalance run smoothly.
@bpux Is your patch available somewhere on github or can you submit a PR ? It will save us some time. Thanks !
Hi, we run cruise control (pulled from '[migrate_to_kafka_2_5]' branch, updated to [Update README regarding Kafka 3.0 and 3.1 support]), with 'auto-heal' enabled for BROKER_FAILURE.
After upgraded cc, we found out, when there was a broker dead, CC detected the failure, but the executor error out. By looking into the code/log, the error seems coming from trying to set 'throttle' to the dead broker.
I enabled the debug and find these error from log...
our kafka version is 2.7.1 , I also re-produced it with kafka 2.6.2 by
Execution finished.
with no partitions movedthis auto-heal was running fine in previous version before we updated CC. I'm not sure what change cause it(CC or kafka-client API), or maybe something wrong on our configurations, which we have not changed anything? for now, I've patched our version(can submit a PR), but would like to know more about the issue. thanks!