linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.68k stars 574 forks source link

Does CC support MSK/Kafka 2.8.1? - NoSuchMethodError: 'void kafka.zk.AdminZkClient.<init> (CC 2.5.137) #2149

Closed marcelloromani closed 1 month ago

marcelloromani commented 2 months ago

I am running Cruise Control on EKS, talking to an MSK cluster.

Full log line:

16:24:10.019 [qtp1754547054-67] WARN org.eclipse.jetty.server.HttpChannel -- /kafkacruisecontrol/kafka_cluster_state java.lang.NoSuchMethodError: 'void kafka.zk.AdminZkClient.(kafka.zk.KafkaZkClient)'

After this happens, cluster state endpoint returns this exception:

16:43:56.091 [qtp1754547054-70] ERROR com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler -- Error processing GET request '/kafka_cluster_state' due to: 'Unable to find viable apply function version for the KafkaZkClient class '. java.util.NoSuchElementException: Unable to find viable apply function version for the KafkaZkClient class

Context:

Cruise Control version: 2.5.137 MSK Kafka version: 2.8.1

Relevant entries from cruisecontrol.properties:

jaas.conf

KafkaClient {
    software.amazon.msk.auth.iam.IAMLoginModule required
    serviceName="kafka";
};

Additional log lines that might be relevant:

16:24:09.992 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] INFO org.apache.zookeeper.Login -- Client successfully logged in. 16:24:09.993 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] INFO org.apache.zookeeper.client.ZooKeeperSaslClient -- Client will use DIGEST-MD5 as SASL mechanism.

16:24:09.995 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] ERROR org.apache.zookeeper.client.ZooKeeperSaslClient -- Exception while trying to create SASL client. java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0

16:24:09.996 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] INFO org.apache.zookeeper.ClientCnxn -- Opening socket connection to server z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com/100.78.38.40:2181.

16:24:09.997 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] INFO org.apache.zookeeper.ClientCnxn -- SASL config status: Will attempt to SASL-authenticate using Login Context section 'Client' 16:24:09.999 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] INFO org.apache.zookeeper.ClientCnxn -- Socket connection established, initiating session, client: /100.72.27.76:43790, server: z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com/100.78.38.40:2181

16:24:10.007 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] INFO org.apache.zookeeper.ClientCnxn -- Session establishment complete on server z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com/100.78.38.40:2181, session id = 0x300000300b2001a, negotiated timeout = 40000

16:24:10.008 [qtp1754547054-67] INFO kafka.zookeeper.ZooKeeperClient -- [ZooKeeperClient KafkaTopicConfigProvider-GetAllActiveTopicConfigs] Connected. 16:24:10.008 [qtp1754547054-67-SendThread(z-3.109927mskrefappuse12.f9b696.c17.kafka.us-east-1.amazonaws.com:2181)] ERROR org.apache.zookeeper.ClientCnxn -- SASL authentication with Zookeeper Quorum member failed. javax.security.sasl.SaslException: saslClient failed to initialize properly: it's null. at org.apache.zookeeper.client.ZooKeeperSaslClient.initialize(ZooKeeperSaslClient.java:399) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1219)

marcelloromani commented 1 month ago

@CCisGG I think we chose a version of CC that is too new for our MSK cluster (which is running Kafka 2.8.1). I remember the first time we played with Cruise Control we used the migrate_to_kafka_2_5 branch but the only one I can find now is migrate_to_kafka_2_4

Would you be able to point me to the CC version that supports Kafka 2.8?

Thanks!

marcelloromani commented 1 month ago

Sorry, rubberduck debugging here:

The main (previously migrate_to_kafka_2_5) branch of Cruise Control is compatible with Apache Kafka 2.5+ (i.e. Releases with 2.5.*), 2.6 (i.e. Releases with 2.5.11+), 2.7 (i.e. Releases with 2.5.36+), 2.8 (i.e. Releases with 2.5.66+), 3.0 (i.e. Releases with 2.5.85+), and 3.1 (i.e. Releases with 2.5.85+).

To answer one of my own questions:

I guess I can refine the question about Kafka 2.8:

2.8 (i.e. with 2.5.66+)

The way I read this is that Cruise Control releases 2.5.66 onwards support Kafka 2.8, with subsequent Kafka versions added to CC. Since we're using 2.5.137, we should be good, correct?

I will try to downgrade CC to 2.5.66 in the meantime: perhaps Kafka 2.8 support has been accidentally dropped in subsequent versions?

marcelloromani commented 1 month ago

It turns out in our build there was an error which for some obscure reason pulled in two different kafka library versions - 3.5.1 and 3.6.1 The mix of jars from these two versions caused the "method not found issue".

We fixed the dependency mismatch by ensuring that only kafka 3.5.1 libraries were included, and the problem went away.

Kafka 3.5.1 is mentioned in CC's gralde.properties: https://github.com/linkedin/cruise-control/blob/2.5.137/gradle.properties#L5

marcelloromani commented 1 month ago

Lessons learned: