yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters
Apache License 2.0
11.81k stars 2.5k forks source link

Update CMAK to use Kafka 2.8.0+ libs due to critical bug discovered in Kafka #900

Open atul008 opened 1 year ago

atul008 commented 1 year ago

A critical bug (https://issues.apache.org/jira/browse/KAFKA-14190) has been discovered where if we use pre-2.8.0 ZK admin clients, it corrupts topics Ids in the Kafka cluster. So using CMAK with Kafka 2.4 libs (currently CAMK is built with Kafka 2.4 libs) will cause this issue in Kafka with version 2.8.0+.

We use kafka-manager to manage our production Kafka clusters and this issue has caused some outages. Opening this issue to address the same.

Update: Updating to the latest Kafka libs won't help as CMAK uses the curator framework to update ZK instead of AdminZkClient. So we need to wait for KAFKA-14190 to be fixed.

atul008 commented 1 year ago

Steps to reproduce :

  1. Add partitions using kafka-manager (that uses pre-2.8.0 kafka client libs) to a topic with Kafka version 2.8.1 (we tested with 2.8.1, it can happen with 2.8.0+ versions )
  2. Restart the controller broker

You should see similar logs in the broker

[2022-08-25 17:44:05,308] ERROR [Broker id=0] Topic Id in memory: jKTRaM_TSNqocJeQI2aYOQ does not match the topic Id for partition myTopic-0 provided in the request: nI-JQtPwQwGiylMfm8k13w. (state.change.logger)

d-mankowski-synerise commented 5 months ago

This issue messed up thousands of partitions in our production cluster - really, stop using CMAK if you do not want to have serious issues, there are much better tools (redpanda console, conduktor, etc.).

The fix was to stop kafka, delete all partition.metadata files and start kafka - then it fetches metadata from ZooKeeper (this procedure can be done one node at a time).