Open danielr1996 opened 5 months ago
Hi @danielr1996, a demo repository would be great! In our Kubernetes offering, we heavily use a GitOps flow (ArgoCD) - as this uses CAPH under the hood, this should work for you as well. So it would be helpful to be able to reproduce it.
Thanks for the fast reply and great to hear that this should theoretically work, I'll prepare a demo repo
I've created a demo repo: https://github.com/kubecraft-k8s/cluster-api-provider-hetzner-1136 just fill credentials and run the script, you will see that after a short time there are two clusters instead of one
@danielr1996, I just had a look at the script to understand its functionality. I haven't executed it yet. From what I can tell, you perform the following steps:
However, I'm uncertain about the expected outcome due to the following reasons:
If there's something I've misunderstood or if you have any questions, please feel free to ask or correct me.
Hi @batistein ,
I didn't install the ccm and cni because it's doesn't really matter for the example.
Point 2/3 is exactly my point, I have two use cases where I would need that:
1) to restore a permant management cluster from a backup 2) spin up an ephemeral management cluster on kind to update the workload cluster
I got the backup part working with velero but it feels a bit akward because to update the workload cluster I need the following steps
0) update the yaml for the workload cluster 1) spin up the kind cluster 2) restore from backup 3) apply the changes to the workload cluster 4) backup the changes 5) commit the changes 6) apply manifests for flux to sync a git repo
While it could theoretically be
0) update the yaml for the workload cluster 5) commit the changes
1) spin up the kind cluster
6) apply manifests for flux to sync a git repo
It may not be a bug because it works as intended, but I still think it would be very useful to have this feature because it heavily simplifies the operation of a cluster and eliminates the use of a seperate backup solution.
You said that you also use a GitOps workflow, so do you really have the git repo as the single source of truth, or do you also have a separate backup solution like velero?
@danielr1996 sorry for not responding here! Just saw this old issue. I didn't 100% understand your use case. I believe that you want to backup and restore a management cluster, and somehow use flux for this.
In the CAPI community (e.g. on Slack), you'll find quite some people that use velero for this. Both backup and restore. Can you just confirm that this is a use case you have, and if so, why you don't want to use velero for both operations?
Considering GitOps: you mainly need to watch out that CAPI creates a lot of resources on its own, which will not be part of your manifests in the Git repo. If you think about that, then GitOps is no problem at all in general. I just don't fully understand how you want to combine GitOps and backups
/kind bug
What steps did you take and what happened: I tried managing my clusters with flux to fully embrace GitOps. This would have the advantage of being able to completely wipe the management cluster without any backup and restore the management cluster from git. However when performing the following steps
1) define the ClusterAPI manifests (HetznerCluster, KubeadmControlplane,...) and push them to a git repository
2a) create a bootstrap cluster on kind 2b) install flux and clusterapi on bootstrap cluster 2c) connect the git repository to the bootstrap cluster
--> ClusterAPI defintions get correctly applied and the cluster starts up 3) delete bootstrap cluster 4) repeat 2a),2b),2c) --> ClusterAPI definitions get correctly applied and a second cluster starts up
What did you expect to happen: Instead of starting a second cluster I would expect the provider to be able to recognize that the desired cluster already exists and just do nothing if it exists.
Anything else you would like to add: I can understand where this issue comes from, I only the define the MachineDeployment but the hetzner-controller then provisions new HCloudMachines that are not stored in git, therefore when I completely wipe the management cluster that reference is lost and hetzner-controller can't know that my desired cluster is still there.
However I still think that this usecase is essential, because after all ClusterAPI was designed to allow declaretive cluster management.
A solution could be to use the labels set on the servers to check which nodes belong to the desired cluster and with that information restore the HCloudMachines that were wiped.
If needed I can provide a demo repository to recreate the problem.
Environment:
kubectl version
) 1.29.0