Open fabriziopandini opened 4 years ago
Prior discussion from https://github.com/kubernetes/kubeadm/issues/1698
@timothysc
As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.
@fabriziopandini
IMO the kubeadm operator should be responsible for two things
- In place mutations of kubeadm generated artifacts
- Orchestration of such mutations across nodes Instead, I think that we should consider out of scope everything that fits under the management of infrastructure or it is related to the management of "immutable" nodes (where "Immutable" = any operation done deploying a new node and removing the old one)
@neolit123
my other top question, this can end up being not-so-secure.
For the kubeadm operator, I think we should focus on the first use case, "declaratively control configuration changes", given that this is not supported by kubeadm now and it was a top priority in the recent survey
Interesting proposal. What I've struggled with over the past year using kubeadm is specifically what I addressed with my own scripts that (procedurally) builds clusters (on github: k8smaker). The best practices for configuring a bare metal cluster is pretty complex. Doing so with AWS is too. The preconditioning script depends strongly on the underlying OS involved. But having built it modularly, I can see how the cluster construction (init, join) can be made completely extensible while providing a fully declarative interface to the user.
It requires:
I offer an opinion: I realize this was specifically stated as out-of-scope for this proposal. I'm suggesting it should be the focus instead of day 2 operations. It seems like a lot of k8s admins have a procedure where upgrading an existing production cluster tends to be (much) more dangerous than building a new one. By automating more of the upgrades, it adds a "magicalness" to that process which results in inevitable breakage being more severe rather than less. Whereas automating the construction process drives towards a very desirable workflow for automating and simplifying the process: simply remove nodes from an existing production cluster description and add them to the new cluster.
Thanks for the consideration.
It seems like a lot of k8s admins have a procedure where upgrading an existing production cluster tends to be (much) more dangerous than building a new one.
i think no matter what we do with kubernetes upgrades we will not be able to fully guarantee zero failures to the users, unless this is fully managed by some high level tooling that understand everything that the user has and wants - including node host details, infrastructure availability and all caveats of the current and next k8s version.
kubeadm or the operator can encode some details about the next k8s version or the node host, but that's all.
the so called "blue / green" cluster upgrades may seem as the better option in the eyes of the user, since the user has the control to scrap the old cluster only once the new cluster is fully working. but they also require infrastructure that some users on self hosted bare metal simply don't have.
Whereas automating the construction process drives towards a very desirable workflow for automating and simplifying the process: simply remove nodes from an existing production cluster description and add them to the new cluster.
we call these node re-place upgrades and the Cluster API is doing them. your project may have recreated parts of Cluster API, kubespray or kops, which are tools that are higher level than kubeadm.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle stale
xref related discussion about cert rotation https://github.com/kubernetes/kubeadm/issues/2652
Not sure if this is the right place to discuss on kubeadm operator. There are some threads in https://github.com/kubernetes/enhancements/issues/2505.
I write a simple kubelet-reloader as a tool for kubeadm operator.
/usr/bin/kubelet-new
./usr/bin/kubelet
and restart kubelet.Currently the kubeadm-operator v0.1.0 can support upgrade cross versions like v1.22 to v1.24.
kubectl
/kubelet
/kubeadm
and upgrade./usr/bin/kubelet-new
for kubelet reloader.See quick-start.
Some thoughts on the next steps
Kubeadm, being a CLI, does not play well with declarative approaches/git-ops workflows.
Assuming that kubeadm is divided in two main parts
This issue is about collecting ideas and define a viable path for making 2 possible using declarative approaches, sometimes referred also as in-place mutations.
For this first iteration, I consider 1 out of scope, mainly because bootstrapping nodes with a declarative approach is already covered by Cluster API and it is clearly out of the scope of kubeadm.