Open fvaleri opened 15 hours ago
The expectation is that users set the cluster ID in the Kafka CR status as part of the recovery. There is no expectation to recover it from the disks. This might need to be documented, but it should not be fixed by reading any cluster IDs from the volumes.
This might need to be documented, but it should not be fixed by reading any cluster IDs from the volumes.
Definitely, because the current documented procedure is broken: https://strimzi.io/docs/operators/latest/deploying#proc-cluster-recovery-volume-str.
Definitely, because the current documented procedure is broken: https://strimzi.io/docs/operators/latest/deploying#proc-cluster-recovery-volume-str.
It is not broken, just old. Let's keep the issue to track the docs improvement.
Bug Description
When Kraft disks are already formatted, we have to reuse the previous cluster.id in addition to the --ignore-formatted flag. This is required by the Kafka storage tool in order to check that cluster.id is the same across all broker disks.
When Kafka pods are restarted, the operator reuses the existing cluster.id from Kafka CR status and skips disk formatting. Instead, when initializing a new Kafka cluster, there is no status, so it always generates a new cluster.id and runs disk formatting.
This is a problem when rebinding existing volumes from a previous Kafka cluster instance because they already have a cluster.id stored in meta.properties. In this case, Kafka pods start crash looping with the following error:
Steps to reproduce
See https://github.com/strimzi/strimzi-kafka-operator/pull/10637#discussion_r1780956446.
Expected behavior
Assuming the binding is correct, the Kafka pods should be able to work with volumes from a previous Kafka cluster instance.
Strimzi version
main
Kubernetes version
1.30
Installation method
No response
Infrastructure
No response
Configuration files and logs
No response
Additional context
No response