strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.67k stars 1.26k forks source link

Unregister Kafka nodes when scaling down #10296

Open scholzj opened 3 weeks ago

scholzj commented 3 weeks ago

Apparently, the Kafka nodes should be unregistered using the Kafka Admin API when scaling-down. Without that, the cluster will still expect them to be present and for example be unable to handle the metadata upgrade:

2024-07-03 12:50:18 INFO  KRaftMetadataManager:165 - Reconciliation #663(timer) Kafka(myproject/my-cluster): Updating metadata version from 3.6-IV2 to 3.7-IV4
2024-07-03 12:50:18 WARN  KRaftMetadataManager:124 - Reconciliation #663(timer) Kafka(myproject/my-cluster): Failed to update metadata version to 3.7 (the current version is 3.6-IV2)
org.apache.kafka.common.errors.InvalidUpdateVersionException: Invalid update version 19 for feature metadata.version. Broker 1002 only supports versions 1-14

This would be simple to implement for regular scale-downs. But will be non-trivial for node pool deletions.

scholzj commented 3 weeks ago

It looks like there is no way to get the list of registered nodes. The Admin APi describeMetadataQuorum method seems to list them as observers until they are rolled. But not anymore. So for example in the scenario where you find out about the issue only after Kafka upgrade when trying to update the metadata, then you have no way to find out the list of nodes. That also means that it might be hard for Strimzi to track and unregister the nodes without keep the list of used node IDs somewhere in the Kafka CR status.

Update: I opened https://issues.apache.org/jira/browse/KAFKA-17094 to track the Kafka limitations related to this.

scholzj commented 3 weeks ago

The most obvious solution for this would be to query the registered nodes from Kafka, compare them with the list of current nodes, and unregister those that were removed. However, Kafka cannot provide this information reliable today because of the issue linked above, and assuming it is confirmed, it seems unlikely to be fixed in 3.8 which should be shortly before an RC1.

We can work around this Kafka issue by storing a full list of used node IDs in the Kafka CR status. That way, we would have our own reliable tracking of the nodes that existed and we can unregister them. However, if we do this, we will change the API and it will be hard to unchange it. So even if Kafka fixes this later, we would be stick with the node IDs field in the Kafka CR status.

We should decide:

scholzj commented 2 weeks ago

Discussed on the community call on 10.7.2024: KAFKA-17094 is currently under discussion in the Kafka project. We should wait for that discussion to be finished. That should gives us better idea when and how it might be addressed in Kafka and then we can decide how to deal with it in Strimzi.