strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.74k stars 1.27k forks source link

[KRaft]: Allow KafkaRoller directly connect to controllers #9692

Open tinaselenge opened 6 months ago

tinaselenge commented 6 months ago

Related problem

KafkaRoller currently connects to the controller via brokers to get the quorum health information.

Suggested solution

With Strimzi supporting Kafka 3.7 which includes KIP 919, we should create a service for controller pods and make KafkaRoller connect to it for quorum information.

Alternatives

No response

Additional context

No response

scholzj commented 6 months ago

Triaged on the community call on 22.2.2024: This should be done once Kafka 3.7 is released. Unless the implementation is trivial, it should have a proposal.

tinaselenge commented 3 months ago

When working on this, I discovered that we need to configure controller listeners with hostnames in order for Admin client to bootstrap. This was not clearly stated in the KIP, so I raised an upstream issue.

Reconfiguring the controller listeners do work and allows the operator to talk to the controller directly. I have raised a draft PR #10016. However, this causes controller pods to crash once or twice until the DNS is updated. When a pod gets terminated, pod's IP address changes and the DNS needs to be updated to resolve the hostnames to the new IP address.

tinaselenge commented 3 months ago

This issue is blocked until https://issues.apache.org/jira/browse/KAFKA-16781 is completed.

ppatierno commented 3 months ago

@tinaselenge take into account that implementing KAFKA-16781 could affect migration as well, because right now the advertised.listeners for a controller node is not allowed in Kafka and it's not applied by the cluster operator. After KAFKA-16781 we should configure advertised.listeners during the migration as well I guess.

tinaselenge commented 2 months ago

We currently cannot update logging configuration dynamically for controllers because we cannot talk to controllers directly. If the logging config is not dynamic, the controllers would not be rolled either. Looks like the user need to trigger manual rolling update for controllers to change the log level. This can be solved once we can talk to the controller directly. However, in the meantime, should this be opened as a separate issue to track it and potentially add a workaround similar to how we handle controller config changes (we extract the controller-relevant configurations and use it in the configuration annotations)? (cc @scholzj)

scholzj commented 2 months ago

@tinaselenge Yeah, if you could open a separate issue it would be great. Looks like the controller access won't be fixed anytime soon. So I will try to look at it and put together some workaround until we can talk with the controllers. Thanks for noticing and raising this!