clusterctl rollout - Githubissues

Arvinderpal commented 4 years ago

As an operator I would like a convenient and consistent mechanism through which I can rollout updates to my control-plane and worker nodes.

As an operator I would like to inspect a rollout as it occurs, rollback changes if needed and view the rollout history.

Detailed Description

Motivated by kubectl rollout.

The idea is to create a new clusterctl sub-command: clusterctl rollout.

Issue/PR Tracker:

[x] Proposal doc: https://docs.google.com/document/d/1fUThmYoAyvpyVMy1Jb6WwhoVfr20CRdJWdcyNXb4rt4/edit?usp=sharing
[ ] Add Conditions to MachineDeployment https://github.com/kubernetes-sigs/cluster-api/issues/3486. Required for status.
[x] Implement restart for MachineDeployments: https://github.com/kubernetes-sigs/cluster-api/pull/3838
[x] Implement pause/resume for MachineDeployments https://github.com/kubernetes-sigs/cluster-api/pull/4054
[ ] Implement status for MachineDeployments
[x] Implement undo for MachineDeployments https://github.com/kubernetes-sigs/cluster-api/pull/4098
[ ] Implement history for MachineDeployments
[x] Update clusterctl docs in CAPI book https://github.com/kubernetes-sigs/cluster-api/pull/4328
[x] Cleanup: https://github.com/kubernetes-sigs/cluster-api/issues/4266

Related: Issue #3401 Issue #3203

/kind feature

vincepri commented 4 years ago

+1 this feature makes sense, we might need a small RFE/proposal

Arvinderpal commented 4 years ago

Common usage patterns may include:

Immediate Rollouts:

clusterctl rollout machinedeployment/my-cluster-md-0
clusterctl rollout kubeadmcontrolplane/my-cluster-control-plane

Rollout based on specific infra machine template. For example, modify the existing MachineDeployment to reference the new infra (e.g. docker) machine template resource. It's assumed that the user has created the my-cluster-md-0-rev-1 beforehand:
```
clusterctl rollout machinedeployment/my-cluster-md-0 --template dockermachinetemplate/my-cluster-md-0-rev-1
```

Monitor status:

clusterctl rollout status machinedeployment/my-cluster-md-0
clusterctl rollout status kubeadmcontrolplane/my-cluster-control-plane

Rollback to the previous deployment or a specific revision:

clusterctl rollout undo machinedeployment/my-cluster-md-0
clusterctl  rollout undo machinedeployment/my-cluster-md-0 --to-revision=2

History:

clusterctl rollout history machinedeployment/my-cluster-md-0

Arvinderpal commented 4 years ago

+1 this feature makes sense, we might need a small RFE/proposal

More than happy to put together a proposal and a POC if we agree that this is the right way to go about this.

vincepri commented 4 years ago

cc @wfernandes @fabriziopandini

/milestone v0.4.0

detiber commented 4 years ago

+1 from me to the high level approach for a near term solution to the problem. It might make sense to also propose support in upstream Kubernetes/kubectl/kubebuilder for a sub-resource type interface so that we could eventually have direct support in kubectl similar to the way we have with the scale subresource today.

fabriziopandini commented 4 years ago

I'm ok with the proposal but I agree with @detiber that the long term solution is to make this to work in kubectl

Arvinderpal commented 4 years ago

I added a link to the proposal. PTAL

Arvinderpal commented 4 years ago

I'm going to start implementing a PoC -- focusing just on MachineDeployments for now. I wanted to ask, if people are okay with having a top level command like clusterctl rolloutor would you prefer something else like (i) clusterctl experimental rollout (ii) clusterctl workload-cluster rollout (iii) ...?

@fabriziopandini @wfernandes,

vincepri commented 4 years ago

clusterctl alpha <>? So we can follow the alpha phases we have in other tools

fabriziopandini commented 4 years ago

/area clusterctl

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fabriziopandini commented 3 years ago

/lifecycle frozen

Arvinderpal commented 3 years ago

The remaining MD commands -- status and history -- depend on conditions in MD. Here is the tracker for that: https://github.com/kubernetes-sigs/cluster-api/issues/3486

vincepri commented 3 years ago

/milestone v1.0

chrischdi commented 2 years ago

Just because I did some research: some context when considering an implementation of clusterctl alpha rollout undo for KCP: if it gets implemented it should take care to not allow downgrades of ControlPlane nodes which could break a cluster.

We should always take https://kubernetes.io/releases/version-skew-policy/ into account which more or less means from a ControlPlane perspective that no MachineDeployments or MachinePools of the cluster should run a kubelet in a minor version which is newer than the ControlPlane kubernetes version to downgrade to.
Also from an etcd perspective: downgrades of minor versions of etcd are not allowed according 3.3 -> 3.4 and 3.4 -> 3.5 upgrade docs, once a cluster was fully upgraded to a specific etcd minor version.

Some more context from upstream discussions about downgrades are available at:

https://github.com/kubernetes/website/issues/12327

(which got closed due to rotten, not resolved).

fabriziopandini commented 2 years ago

/triage accepted

hiromu-a5a commented 1 year ago

/assign

killianmuldoon commented 1 year ago

@hiromu-a5a Good to see somebody picking up this work! I just wanted to mention that some parts of this - if they involve changes to the MachineDeployment controller - might overlap with work ongoing in https://github.com/kubernetes-sigs/cluster-api/issues/7730

I think it might be a good idea to sync on those parts of the work to ensure stability on main (and have fewer rebases :smile: ).

Thanks again for picking this up though! I think the pieces that impact clusterctl (like #7988) should have no / few clashes with the MD work.

hiromu-a5a commented 1 year ago

While I tried the existing rollout undo command, I felt that the rollout might violate the version skew policy easily and accidentally. I'd like to suggest emitting a warning if the operation of a user breaks the version skew policy. What do you think? If you agree, I'll make another issue.

fabriziopandini commented 1 year ago

I'm +1 to open a discussion on how to prevent undo operations that can lead to issue

hiromu-a5a commented 1 year ago

Posted discussion.

hiromu-a5a commented 1 year ago

I couldn't find any responses to https://github.com/kubernetes-sigs/cluster-api/discussions/8170. Please let me know if there is any appropriate forum for discussion. (If you meant something different, such as opening a discussion in the office hour, I am sorry)

fabriziopandini commented 1 year ago

@hiromu-a5a i'm not sure to understand why this topic is not gaining traction after callouts to office hours. My only assumption is that really few users are relying on this feature, and this somehow matches with the fact that no one reported other existing issues we found while working on label propagation (on top of my mind: history was not tracking in-place changes, clusterctl rollout was not considering all the versions an MS might have, probably more)

What I can suggest at this stage is to continue to collect ideas on this issue or to take the initiative in defining what should be improved in this feature and how.

hiromu-a5a commented 1 year ago

Thank you for your FB.

To take the initiative, I've opened issue for now. I think this should be discussed in a separated issue rather than a sub topic of this issue. https://github.com/kubernetes-sigs/cluster-api/issues/8408

k8s-triage-robot commented 7 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 7 months ago

/priority backlog

fabriziopandini commented 6 months ago

/unassign @hiromu-a5a

My personal understanding about this feature is that it is becoming less and less relevant considering git ops, cluster class, lack of request/queries/feedback from the community etc.

Considering that, the fact that we never completed this feature, and we have pending issues, we have maintenace costs related to it, I think that as a project we should ask ourself it it the case to deprecate and remove it.

/remove-lifecycle frozen

fabriziopandini commented 5 months ago

Let's keep this on hold until the discussion on https://github.com/kubernetes-sigs/cluster-api/issues/10479 is sorted out

fabriziopandini commented 4 months ago

Community agreed on deprecation for revision management, so also clusterctl alpha rollout undo is going away /close

k8s-ci-robot commented 4 months ago

@fabriziopandini: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/3439#issuecomment-2233311717): >Community agreed on deprecation for revision management, so also clusterctl alpha rollout undo is going away >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-api

clusterctl rollout #3439