Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
This is something we saw in our Production Kafka Cluster.
There was one Slow Broker in Kafka Cluster, and KMF was struggling to Produce Data on the partition hosted on Slow Broker.
Since we are using exactly one Producer to produce data at all KMF Partitions, partitions that were hosted on the good broker as well, felt equal slowness as they all share the same Producer, and it becomes hard to detect, which broker is actually the culprit.
In short, we need to find a way to make sure that one partition being slow(bcoz hosted on Slow Broker) should not impact other parts, so that we can clearly see which part is slow and then where exactly it is hosted(to detect slow broker)
This is something we saw in our Production Kafka Cluster.
There was one Slow Broker in Kafka Cluster, and KMF was struggling to Produce Data on the partition hosted on Slow Broker. Since we are using exactly one Producer to produce data at all KMF Partitions, partitions that were hosted on the good broker as well, felt equal slowness as they all share the same Producer, and it becomes hard to detect, which broker is actually the culprit.
In short, we need to find a way to make sure that one partition being slow(bcoz hosted on Slow Broker) should not impact other parts, so that we can clearly see which part is slow and then where exactly it is hosted(to detect slow broker)