kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.57k stars 1.31k forks source link

Define clusterctl move process #1525

Closed fabriziopandini closed 4 years ago

fabriziopandini commented 5 years ago

[2019-11-26 Updated] according to https://github.com/kubernetes-sigs/cluster-api/pull/1730#discussion_r346935816 Pivot is now rebranded into Move; issue updated accordingly

User Story

As an operator I would like to move cluster objects and all the associated resources (Machines, Machine Depolyments etc.) from the current management cluster to another management cluster for any reason

Detailed Description

The clusterctl CAEP currently in flight assumes the user should brig its own management cluster, so technically the sequence bootstrap cluster -> pivot to -> management cluster is not necessary anymore.

However, the same CAEP consider pivoting a possible answer to different operational needs, e.g because of maintenance or replacement of the management cluster, so pivoting is still in scope.

With v1alpha3 in flight and the new assumptions around clusterctl - one binary for rule all the providers -, the implementation detail should be re-validated, keeping in mind also https://github.com/kubernetes-sigs/cluster-api/pull/1730#discussion_r346935816 discussion that lead to transforming pivot into move.

Goals

Non-Goals

Anything else you would like to add:

There is a lot of learning from past experiences on pivoting, so I'm pasting below some comments from different threads. Feel free to add more.

/kind feature

fabriziopandini commented 5 years ago

from https://github.com/kubernetes-sigs/cluster-api/issues/1065#issuecomment-505475948

Pivot is complicated for a few reasons, I'm not sure it could be simplified outside of an external tool.

  • cluster-api components need to be moved to a new cluster
  • scaling of the cluster-api controllers from the source cluster need to be scaled down before the target cluster cluster-api controllers are running to avoid multiple controllers running at the same time
  • cluster and machine* objects need to be deleted out of the source cluster without removing the underlying resources (currently called "force delete" and done by removing the finalizers before deleting)
  • cluster and machine* objects need to be created in the right order on the target cluster

from comments in the GDOC for clusterctl redesing proposal

We either need to: 1) ensure controllers are not running in both source/target management clusters or: 2) Ensure that all individual Machines are moved prior to all MachineSets, which are moved prior to all MachineDeployments. All pre-requisite resources for those will all need to be moved prior as well (Cluster, cluster infra, machine infra templates, machine bootstrap templates, secrets, etc).

detiber commented 5 years ago

I would also add a potential for: 3) Add an annotation to inform cluster-api controllers to not reconcile resources, that way the annotation could be applied prior to pivoting for all resources, and removed after pivoting all resources.

fabriziopandini commented 5 years ago

@detiber ACK, updated. WRT the implementation, we can consider also the option to scale down controller deployments

detiber commented 5 years ago

we can consider also the option to scale down controller deployments

100%, I'm just thinking of ways that are potentially less error prone and generally more forgiving to changes such as the one from v1alpha1 to v1alpha2 where we switched from StatefulSets to Deployments.

fabriziopandini commented 4 years ago

@ncdc @vincepri @detiber, considering the fact that pivot changed into move, and that we are going to support partial move (e.g. move of cluster objects existing in a namespace only), IMO the best option for forcing cluster-api controllers to not reconcile resources is to add an annotation to inform cluster-api controllers to not reconcile resources, as suggested by @detiber

WDYT?

ncdc commented 4 years ago

+1 to annotation

vincepri commented 4 years ago

Annotation sounds good, there was someone else asking for something similar to pause reconciliation on certain objects, which might be helpful.

joonas commented 4 years ago

/assign @fabriziopandini

fabriziopandini commented 4 years ago

Ok, the annotation address the problem of stopping controllers to reconcile objects before move.

However, there is still two problems to be addressed:

  1. How to move "generic" hierarchies of objects for swappable control-plane/infrastructure/bootstrap providers. (in v1alpha2 everything was 1 level unstructured object, now there are cases with nested hierarchies of objects e.g. CAPV, CACPK)
  2. If/How to support objects being shared across clusters

@akutz @ncdc

akutz commented 4 years ago

Thank you @fabriziopandini,

Number one is super important to CAPV. We have numerous resources unknown to CAPI's core CRDs, but our entire graph is still reachable via owner refs. We need the move operation to support descendant discovery via owner refs, or the move will not work for CAPV.

fabriziopandini commented 4 years ago

/lifecycle active For the clusterctl part

fabriziopandini commented 4 years ago

@akutz descendant discovery via owner refs is in flight; I will ping you as soon as ready

fabriziopandini commented 4 years ago

this was implemted by

k8s-ci-robot commented 4 years ago

@fabriziopandini: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1525#issuecomment-583278766): >this was implemted by >- https://github.com/kubernetes-sigs/cluster-api/pull/2130 >- https://github.com/kubernetes-sigs/cluster-api/pull/2161 >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.