Hard to detect one Bad Broker.

This is something we saw in our Production Kafka Cluster.

There was one Slow Broker in Kafka Cluster, and KMF was struggling to Produce Data on the partition hosted on Slow Broker. Since we are using exactly one Producer to produce data at all KMF Partitions, partitions that were hosted on the good broker as well, felt equal slowness as they all share the same Producer, and it becomes hard to detect, which broker is actually the culprit.

In short, we need to find a way to make sure that one partition being slow(bcoz hosted on Slow Broker) should not impact other parts, so that we can clearly see which part is slow and then where exactly it is hosted(to detect slow broker)

linkedin / kafka-monitor

Hard to detect one Bad Broker. #395