kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.52k stars 1.3k forks source link

Cluster-api takeover of existing Kubeadm clusters #10820

Open AmitSahastra opened 3 months ago

AmitSahastra commented 3 months ago

What would you like to be added (User Story)?

Today, If we want to manage clusters via cluster-api, the only way Is to create a new cluster with clusterctl init. But if I have an existing cluster, there is no way to manage that cluster via cluster-api.

Detailed Description

With the cluster takeover option, we aim to address this use case by introducing a new cluster type or annotation that will differentiate between new cluster launches or take over cluster kind and instead of going with cluster init operation it goes for cluster join operation to put new node and eventually drain the old nodes and migrate all applications and cluster components to the new cluster.

Original discussion for reference: here

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

k8s-ci-robot commented 3 months ago

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
fabriziopandini commented 3 months ago

The best we have is this demo which is basically the solution you are proposing; also https://www.youtube.com/watch?v=KzYV-fJ_wH0 might be interesting

I cannot guarantee that this time the discussion will make more progress, because it "relatively" easy to do it for a specific provider/use case, but hard to do in a generic way, and no one so far showed interest in doing so.

To make progress it is required to engage the community across different provides and to write a proposal. (Note: I don't want top push back, but the topic is complex and we have history).

/priority backlog /kind proposal

AmitSahastra commented 3 months ago

@fabriziopandini Thanks for inputs. As you pointed it out it is a simple approach for a complex problem. I was able to test out aws iaas cluster and vsphere so far with only CAPI controller changes and without any cloud provider-side changes.

My testing do involve the discovery phase of underlying resources like compute/network layer/security depending upon the cloud and old cluster configuration.

To make progress it is required to engage the community across different provides and to write a proposal.

If I understood it correctly, did you have to do cloud provider** changes to make it work? If so can you share few details on what type of cloud/cluster setup it was and what kind of changes were required in provider side and why?

neolit123 commented 3 months ago

Q: why does this have to be a core CAPI feature and not a feature of the CAPI operator project? https://github.com/kubernetes-sigs/cluster-api-operator in a well architectured manner infra providers can contribute to a common "migrator" interface. also on top there should be abstraction around CP / boostrap for kubeadm and non-kubeadm.

AmitSahastra commented 2 months ago

Q: why does this have to be a core CAPI feature and not a feature of the CAPI operator project? https://github.com/kubernetes-sigs/cluster-api-operator in a well architectured manner infra providers can contribute to a common "migrator" interface. also on top there should be abstraction around CP / boostrap for kubeadm and non-kubeadm.

@neolit123 It will be interesting to see if CAPI operator can be utilised or not. I am new to CAPI operator project, please help me understand how it works and will benefit in solving any problem related to this proposal.

Q1. Can CAPI operator control Conditions/status in CAPI (KCP in this case or in non KCP based cluster) to make it believe its already initialised or not? Q2. If not we still have to make changes in CAPI code as suggested in this issue ? Q3. Can CAPI operator help in discovery phase of existing cluster infrastructure.

neolit123 commented 2 months ago

Q1, Q2: if users with a privileged kubeconfig can do it with kubectl on the command line then an external operator controller can do it as well. surely CAPI controllers can enter some sort of "migration mode" if that can help, but IMO any related change in core CAPI would need a CAEP https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/proposals Q3: i think so.

you should get in contact with the CAPI operator project to get their feedback. check #cluster-api-operator on k8s slack.

tobiasgiese commented 2 months ago

also https://www.youtube.com/watch?v=KzYV-fJ_wH0 might be interesting

I created a PoC according to this talk/demo in https://github.com/tobiasgiese/cluster-api-migration last year. Maybe this helps somehow. You should definitely try a cluster migration with a legacy test kubeadm installation first.