Closed JTarasovic closed 3 years ago
This issue requires a certain degree of coordination across several components, so the first question in my mind is where to implement this logic. I don't think this should go at cluster level, because cluster main responsibility is the cluster infrastructure, so what about assuming this should be implemented in separated extension (with it's own CRD/controller)?
/milestone v0.4.0
We should revisit in v1alpha4 timeframe, probably needs a more detailed proposal
cc @rbitia
Ria, this might fit into your "cluster group" proposal?
We have a relatively small (but growing) number of clusters so we're currently doing upgrades sort of manually. Conceptually, we think about our clusters in 3 streams - alpha, beta and stable - and roll out upgrades and configuration changes according to stream.
Our plan right now is to have common configuration for a stream in a CR (StreamConfig
) w/ a controller. The StreamConfig
controller would reconcile to ClusterConfig
s based on label / annotation with its controller handling the actual cluster resource reconciliation (eg creation, k8s version upgrades, etc).1
I don't think that it's CAPIs responsibility to implement all of that (or any) but if we can do some of the common stuff (version upgrades) here, that seems like it would be super valuable for the whole community. It also seems like the logic would be broadly applicable - copy template, update KCP
, rollout, copy template, update MD
s, rollout, profit2.
1Names are illustrative and not definitive. Something, something hard problems in Computer Science.
2Grossly over-simplified here for effect.
Thanks for the extra context @JTarasovic, from everything I'm hearing here it might be worth considering some extra utilities/libraries/commands under clusterctl
which could perform some variations of the concepts described above.
Ideally, I'd be able to declare my intent to upgrade the workload cluster and that would be reconciled and rolled out for me.
I find that if I change the "spec.version" field in an existing KubeadmControlPlane object and apply the change, usually the controllers will upgrade my control plane, without me introducing a new (AWS)MachineTemplate. It sounds like that's not supposed to work, and yet it does—most of the time. Why is that?
Does it actually change the version of the running cluster - eg kubectl get no -o wide
shows the new version?
It did not in our experience. It would roll the control plane instances but they'd still be on the previous version.
This is how upgrading k8s version on control planes works currently: https://cluster-api.sigs.k8s.io/tasks/kubeadm-control-plane.html?highlight=rolling#how-to-upgrade-the-kubernetes-control-plane-version
Note that you might need to update the image as well if you are specifying the image to use in the machine template.
Does it actually change the version of the running cluster - eg
kubectl get no -o wide
shows the new version?
Yes, it shows the new version there.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Any updates or actions items here?
I think the clusterctl rollout
issue linked above is a good first approximation but I agree w/ @detiber's comment there:
propose support in upstream Kubernetes/kubectl/kubebuilder for a sub-resource type
as that should allow folks to build controllers on top of it.
I'm cool with closing this issue in favor of that.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
I think the clusterctl rollout
feature doesn't solve the problem of having to update the image + k8s version for every machine deployment / machine pool / kubeadm control plane that you want to upgrade as a user, although it does give more control on the rollout of machines. It would still be nice to have some sort of higher order "upgrade my cluster" automation. @craiglpeters @devigned and I were discussing this earlier today and one thing that came up was maybe having a way to tell your management cluster which image to use for which k8s version and having the machine template look that up instead of having to individually update the image version on each cluster. This would also allow patching images across all your clusters if you have to rebuild an image for the same k8s version (eg. because of a CVE).
/remove-lifecycle rotten
We have a relatively small (but growing) number of clusters so we're currently doing upgrades sort of manually. Conceptually, we think about our clusters in 3 streams - alpha, beta and stable - and roll out upgrades and configuration changes according to stream.
Our plan right now is to have common configuration for a stream in a CR (
StreamConfig
) w/ a controller. TheStreamConfig
controller would reconcile toClusterConfig
s based on label / annotation with its controller handling the actual cluster resource reconciliation (eg creation, k8s version upgrades, etc).I don't think that it's CAPIs responsibility to implement all of that (or any) but if we can do some of the common stuff (version upgrades) here, that seems like it would be super valuable for the whole community. It also seems like the logic would be broadly applicable - copy template, update
KCP
, rollout, copy template, updateMD
s, rollout, profit.
We are in a really similar situation with a large number of clusters and three different pipelines/streams for development/staging/production clusters. We are starting the development of a new component to handle this in a similar fashion (copy template, update KCP
, update MachinePool
, etc), so it'd be great if we could share tooling. We were also interested in making this component capable of orchestrating this upgrade process so we could, for instance, decide to upgrade node pools one after the other, with some wait period in between, instead of all at once.
If I understand it correctly, this proposal adds kubectl rollout
like subcommands to clusterctl
but this wouldn't solve the use cases discussed above.
Should we submit a new CAEP proposal for discussion?
Same use case here, looping over machine compute scalable resources e.g machineDeployments to upgrade them one by one against the current control plane version.
For scenarios where more control is required it'd be possibly good to have autoUpgrade: false/true
control per machine scalable resource. So you can leveraged are more controlled upgrade for a given machine pool e.g https://github.com/kubernetes-sigs/cluster-api/pull/4346
we have similar use case, we are using gitops + capi, to upgrade our clusters, for now we have to create new machinetemplate, update kcp, wait for that to finish delete old template, create new machinetemplate for machinedeployment, wait for rollout, delete old machinetemplate.. an operator or additional feature/resource that could handle this lifecycle as a whole (declaritively) would be ideal for us, so we can upgrade the KCP and machinedeployments machinetemplate references at same time and let the cluster reconcile and upgrade the controlplane and workers in correct order, then purge unwanted machinetemplates
This relates to the cluster Class discussion https://github.com/kubernetes-sigs/cluster-api/issues/4430. This will require a considerable amount of work and thinking to get it right. @vincepri is this work still intended to make it to v1alpha4 or can we move it to next milestone?
/area upgrades
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
What about closing this given the ClusterClass work?
Agree. This will be 100% covered by what we want to do with ClusterClass.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/close As per comment above this is part of ClusterClass; ongoing work in https://github.com/kubernetes-sigs/cluster-api/pull/5059
@fabriziopandini: Closing this issue.
User Story
As an operator, I would like to be able to easily update the Kubernetes version of my workload clusters to be able to stay on top of security patches and new features.
Detailed Description
The procedure for updating the k8s version currently* is to copy the
MachineTemplate
for KCP, update KCP w/ new version and reference to newMachineTemplate
which causes a rollout. Rinse and repeat forMachineDeployments
.Ideally, I'd be able to declare my intent to upgrade the workload cluster and that would be reconciled and rolled out for me.
Anything else you would like to add:
Discussed on 17 June 2020 weekly meeting.
/kind feature