strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.81k stars 1.28k forks source link

[Bug]: KRaft controller cannot start up in Kafka 3.9 or trunk branch #10458

Closed showuon closed 2 months ago

showuon commented 2 months ago

Bug Description

Due to the implementation of KIP-853, the KRaft controller cannot accept 0.0.0.0 as the advertised listener address anymore. When startup, it'll fail with the error:

2024-08-16 09:27:44,108 INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util) [main]
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: advertised.listeners cannot use the nonroutable meta-address 0.0.0.0. Use a routable IP address.
    at scala.Predef$.require(Predef.scala:337)
    at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:1008)
    at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:842)
    at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:184)
    at kafka.tools.StorageTool$.$anonfun$execute$1(StorageTool.scala:80)
    at scala.Option.flatMap(Option.scala:283)
    at kafka.tools.StorageTool$.execute(StorageTool.scala:80)
    at kafka.tools.StorageTool$.main(StorageTool.scala:47)
    at kafka.tools.StorageTool.main(StorageTool.scala)

Steps to reproduce

No response

Expected behavior

No response

Strimzi version

0.42.0

Kubernetes version

minikube v1.32.0

Installation method

No response

Infrastructure

No response

Configuration files and logs

No response

Additional context

No response

scholzj commented 2 months ago

This cannot be a bug because we do not support Kafka 3.9.0 as it is not released. Also, as I recently checked, Kafka does not provide any snapshot or nightly builds, so we do not support any trunk Kafka version either.

Also, I think that using 0.0.0.0 in listeners is key to making Kafka work in various environments including Kubernetes. So if it would not be possible to use it, it might not allow Strimzi and other users to adopt it. So I think that would be a bug in Kafka. That said, the error message you shared is confusing as it talks about advertised.listeners (where not using 0.0.0.0 makes sense), but we never use 0.0.0.0 in advertised listeners only in listeners. So it does not seem like something that could be produced by Strimzi.

ppatierno commented 2 months ago

Yeah I agree with Jakub. I was confused when this issue was opened as a bug when we don't support Kafka 3.9.0 because it's not out yet.

Also, I think that using 0.0.0.0 in listeners is key to making Kafka work in various environments including Kubernetes. So if it would not be possible to use it, it might not allow Strimzi and other users to adopt it.

+100

Strimzi leverages on 0.0.0.0 for the listeners we have (replication, plain, tls, whichever one you define in your Kafka CR) but not in the advertised listeners where we have the full service name there (for each broker/controller).

vitaliyf commented 2 months ago

I got curious - the error message reported in this issue was added in https://github.com/apache/kafka/pull/16464 - and it does seem to say it applies just to "advertised" listeners, where it makes sense to disallow 0.0.0.0

scholzj commented 2 months ago

I got curious - the error message reported in this issue was added in apache/kafka#16464 - and it does seem to say it applies just to "advertised" listeners, where it makes sense to disallow 0.0.0.0

Right, that would make sense. You cannot connect to 0.0.0.0, so it for sure should not be advertised and Strimzi does not use it as advertised address.

showuon commented 2 months ago

This cannot be a bug because we do not support Kafka 3.9.0 as it is not released. Also, as I recently checked, Kafka does not provide any snapshot or nightly builds, so we do not support any trunk Kafka version either.

My bad! I didn't make it clear. I was testing Kafka 3.9 branch (not released, yet), and building my own 3.9 image to run with current cluster operator. That said, when 3.9 RC is out, this issue should still exist.

That said, the error message you shared is confusing as it talks about advertised.listeners (where not using 0.0.0.0 makes sense), but we never use 0.0.0.0 in advertised listeners only in listeners. So it does not seem like something that could be produced by Strimzi.

You're right! We only used 0.0.0.0 in listener. But the problem is before v3.9.0, KRaft controller didn't use advertised.listener. After v3.9.0, the Kafka controller allows to set and use advertised.listener, and if we didn't set advertised.listener, it'll use listener config (i.e. 0.0.0.0) and fail the validation.

cc @tinaselenge

scholzj commented 2 months ago

That is still not a bug. We will deal with it when the time comes. There is nothing we can do now anyway.

showuon commented 2 months ago

That is still not a bug. We will deal with it when the time comes. There is nothing we can do now anyway.

Agree! Should I close it?

scholzj commented 2 months ago

Yeah, I think we can close it. We will anyway run into it when 3.9 has the first RCs.