Closed jiao-zhangS closed 3 years ago
sorry, I figured out this. It was due to our cluster has some one-replica topics, so after first time's self healing, failed broker is still detected(because it's out of clusters and has assigned replicas). Then following up self-healing was triggered but did nothing. Sorry for bothering, okay to close this.
Hi, we are using 2.4.36 and enabled 'self.healing.broker.failure.enabled'. We observed when one broker failed, self healing was executed after 'broker.failure.self.healing.threshold.ms'(we set it as 5 mins). And self healing was triggered again(out of expectation) after first time execution is finished. We are suspecting the same anomaly was put back to queue again as here's code shows because 'skipReportingIfNotUpdated' set as 'false' here. Is my suspect correct? any clues why two times execution occurred for one broker's failure. Another suspicious point is why the same log are logged twice almost at the same timing..
Following are some logs which can help describe what happened.
After the execution was finished, we saw a lot of following 'Generating a fix for the anomaly' log.
At '2021-04-15 20:00:36,773' the second timed execution for self healing is started.