Closed Patanouk closed 1 year ago
Having the same issue. Pretty critical issue for us, as health endpoint is used by K8S to check if pods are healthy or not. Had all of our pods go unhealthy because one Kafka node (out of 3) was restarting.
If that helps, we ended up writing our own health indicator by creating a class with a @Replaces(KafkaHealthIndicator.class)
annotation
The healtcheck class was trying to write a message to the kafka cluster, and would return unhealthy if the write fails a couple of times in a row
I would be happy to open a PR if the maintainers of the project are agreeing with the approach proposed above
@Patanouk PRs welcome
Issue description
Hello,
We are using the Micronaut Kafka library, and have enabled the kafka health check, to mark our apps as unhealthy if our kafka cluster goes down
We have a kafka cluster with 3 brokers Our kafka producers are configured with
Acknowledge.ALL
We have the following configuration for our kafka brokersAccording to the documentation in the kafka documentation for min.insync.replicas, a kafka producer with
Acknowledge.ALL
should be able to write to a kafka cluster, as long as 2 insync replicas are acknowledging the writeHowever, the healthcheck in the KafkaHealthIndicator compares the
offset.topic.replication.factor
to the number of available nodesSo in our case, with 3 brokers, a rollout restart of the kafka brokers will make one the kafka brokers unavailable Hence, the Kafka health check in the Micronaut Application fails when we rollout restart our kafka cluster (as only 2 nodes are healthy), even though our producers should be able to write to Kafka
Is there a reason to prefer the
offsets.topic.replication.factor
over themin.insync.replicas
in the healthcheck? I'm relatively new to Kafka, so there might totally be something I'm missing there