Closed rabii17 closed 2 months ago
Please format the log to make it readable. You should probably also share your custom resources.
@ppatierno: I guess this is for you.
Please format the log to make it readable. You should probably also share your custom resources.
@ppatierno: I guess this is for you.
Thank you for your reply, I just added the custom resources and modified the logs format
Paolo is the expert on migration. But I wonder if ephemeral storage is the problem here. It means you start with an empty disk every time you roll the brokers and that can cause all kind of issues.
I'm not sure if this is expected error (and if it is, it should be properly documented). But in general:
@scholzj It is a dev environment on which we want to start testing the migration and that's why we are using ephemeral storage.
The controller pod started as expected without any issues when we switched the annotation to migration
. The provided logs are from the first rebooted broker.
Well, yeah -> the controller will work on the first restart as it is expected to have a brand new volume there. But will likely have a problem in the next restart anyway and it will loose all the data that are suposed to be migrated from the ZooKeeper cluster. The broker is not expected to be empty at this point, so that is likely why you have this issue. As I said, there might be things to improve, but migrating a cluster like this will probably never work.
I can confirm what Jakub already said. The migration isn't meant to be used for clusters based on ephemeral storage, it cannot work because of the need for a cluster id and related node formatting which can't work when the storage is empty on each restart. This is true for brokers and controllers. As Jakub said, controllers will start the first time but then they will have same problem on the next restart which happens later in the migration process. I guess it's not documented while it should be. I will find some good place to highlight it in the documentation.
@rabii17 so it seems this problem was fixed in 0.41.0 (while you are using 0.40.0). I tried with 1 controller using ephemeral (or persistent) and 3 brokers using ephemeral as well and everything seems to work fine. Could you give it a try by using 0.41.0 release please?
@rabii17 just to be precise when using 1 controller with ephemeral storage the rolling works without errors but you are going to lose metadata synced with ZooKeeper during migration when a controller rolling happens. So one controller with ephemeral can't work. In the linked PR there is an addition to the doc making it clearer.
Triaged on 13/6/2024: discussed if it would be useful to have at least a warning or maybe blocking the user who is trying to migrate with just 1 controller using ephemeral storage. Keeping this open for the next community call.
Discussed on the community call on 10.7.2024: We document on multiple places that ephemeral storage is suposed to be used only for development and shortlived clusters in CIs etc. The migration actually works with multiple ndoes with ephemeral storage, only the one ephemeral node is an issue. We should not increase the complexity for this.
Bug Description
During the migration, the Kraft controller is created then when the cluster started to rollout, an error message appears :
The kafka configuration file appears to be for a legacy cluster. Formatting is only supported for clusters in KRaft mode.
Steps to reproduce
disabled
tomigration
Expected behavior
Migration succeeded
Strimzi version
0.40.0
Kubernetes version
Kubernetes 1.27
Installation method
Helm Chart
Infrastructure
AKS
Kafka CR
Brokers KafkaNodePool CR
Controllers KafkaNodePool CR
Configuration files and logs