mutable upgrades: upgrade the kube-proxy DS after all control-plane components

neolit123 commented 4 years ago

during mutable upgrades, upgrade kube-proxy only after the kube-apiserver on all nodes have been upgraded.

currently kubeadm does the following:

upgrades the control-plane on one node (via "apply")
upgrades kube-proxy and coredns
upgrades the rest of the control-plane nodes and worker nodes
upgrades kubelets everywhere

the problem with that is that it can create issues for a period of time when kube-proxy's version is ahead of a kube-apiserver version in an HA setup.

https://kubernetes.io/docs/setup/release/version-skew-policy/#kube-proxy

there is also another problem where kube-proxy would be updated before the kubelets on nodes and the policy suggests they should match.

this would require:

"apply" phases so that upgrades of the addons are performed last after kube-* control-plane components are upgraded: https://github.com/kubernetes/kubeadm/issues/1318
changes in the upgrade documentation: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

TODO List:

[x] introduce a new feature gate UpgradeAddonsBeforeControlPlane to fix a kube-proxy skew policy misalignment
https://github.com/kubernetes/kubernetes/pull/116570
https://github.com/kubernetes/kubernetes/pull/117660
[x] add e2e tests for latest k8s release
https://github.com/kubernetes/kubeadm/pull/2884
[x] update docs for alpha FG: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#feature-gates
https://github.com/kubernetes/website/pull/41606
[x] remove the deprecated featuregate - perhaps next release 1.31?
https://github.com/kubernetes/kubernetes/pull/124715
https://github.com/kubernetes/test-infra/pull/32599
https://github.com/kubernetes/website/pull/46274

neolit123 commented 4 years ago

cc @fabriziopandini

yvespp commented 4 years ago

Don't know how hard it would be but could the proxy maybe be converted to a static pod manifest to solve this? The update of it could then be moved from kubeadm upgrade apply <vesions> on the master to kubeadm upgrade node.

Remove manifest of proxy to stop it
Stop kubelet
Update config and manifest
Start kubelet

This should also play nicely with immutable nodes.

neolit123 commented 4 years ago

having kube-proxy as a static pod would definitely help around the fact kubelet and kube-proxy versions need to be in sync during upgrade.

there are pros on cons of both DS vs static pods. in general we are likely to see users preferring to run replicas of kube-proxy on all nodes (ala DS), and for multi-platform support variants of the DS can be deployed, which is what the kubeadm for Windows deployment is doing today.

this also opens some questions around instance specific configuration, and how to manage being possible to configure kube-proxy differently on different nodes: https://github.com/kubernetes/enhancements/pull/1439

at this point however it is quite the breaking change to user expectations - e.g. if they have patches for the podspec of the DS to pass extra flags or modify pod resources. wondering if we are pass the point of changing this.

This should also play nicely with immutable nodes.

cluster-api discussion from today that mentions nodeselectors / labels and multiple DSes as one way of solving the problems for immutable upgrades: https://kubernetes.slack.com/archives/C8TSNPY4T/p1604950077231800

fabriziopandini commented 4 years ago

@neolit123 thanks for raising this point. I think that we should focus on solutions that are not making the user experience more complex, by adding more steps to the upgrade sequence.

If I got this right v of kube-proxy is f(v of API server, v of Kubelet on the node), and so I'm kind of intrigued about the idea of transforming kube-proxy into a node-specific manifest because:

This allows having "self-consistent" control plane nodes, which is a requirement for mutable upgrades.
This makes the workers less sensible while upgrading (this can take some time in a big cluster).

This is something that should be discussed with focus to:

Transitioning current cluster to the new model
Possible additional maintenance burden due to the new model.

WRT to the latest point, the idea of an operator managing automatically the merge of cluster level and node level setting would be a perfect fit for preserving a nice user experience no matter of how kube-proxy is deployed....

neolit123 commented 4 years ago

Ok given there is interest in the ststic pod approach we should raise it for discussion in the next sig meeting. Its tuesday next week.

randomvariable commented 3 years ago

Just to return to this, have been reviewing the Draft STIG for Kubernetes, and CNTR-K8-000440 states the following:

Kubernetes kubelet static PodPath must not enable static pods. with check content: "On the Master and Worker nodes, change to the /etc/sysconfig/ directory and run the command: grep -i staticPodPath kubelet If any of the nodes return a value for staticPodPath, this is a finding."

FWIW, I have requested that we give feedback for the following:

Change the offensive language
Add a kubeadm compatible check that greps the /var/lib/kublet/config.yaml
Remove the text for control plane nodes, as other bits of the document explicitly expect the control plane components to be run as static pods.

However, I can understand that they will want to stand by not enabling static pods on nodes, which means we may not want to pursue converting the kube-proxy deployment to a static pod.

neolit123 commented 3 years ago

Just to return to this, have been reviewing the Draft STIG for Kubernetes, and CNTR-K8-000440 states the following:

not enabling static pods at all is disagreement with the k8s project maintainers basically. i'm not familiar with this website and their work.

Change the offensive language

Add a kubeadm compatible check that greps the /var/lib/kublet/config.yaml

Remove the text for control plane nodes, as other bits of the document explicitly expect the control plane components to be run as static pods.

are these requested changes for them to do?

However, I can understand that they will want to stand by not enabling static pods on nodes, which means we may not want to pursue converting the kube-proxy deployment to a static pod.

i don't like kube-proxy as a static pod mainly because it imposes tight coupling with the kubeadm binary. i.e. it get's promoted from a optional addon that may eventually get applied via kubectl and templated manifests to something that is hardcoded or available via an interface that requires root (i.e. an addon interface that deploys static pods).

randomvariable commented 3 years ago

i'm not familiar with this website and their work.

It's part of the the US DOD

are these requested changes for them to do?

Yes, I've requested the changes.

i don't like kube-proxy as a static pod mainly because it imposes tight coupling with the kubeadm binary.

Ah, ok. I thought that maybe work had started on converting it to a static pod.

pacoxu commented 3 years ago

How about creating a job to do the upgrade of kube-proxy?

The job will wait for all apiserver versions to the newer one, and then upgrade kube-proxy.

We have used similar logic in some cluster installation for special components like cni initial and storage components.

neolit123 commented 3 years ago

The job will wait for all apiserver versions to the newer one, and then upgrade kube-proxy.

but that would still happen before the kubelet upgrade on the node? according to the policy the versions of kube-proxy and kubelet should match.

at this point i'm -1 on moving kube-proxy to a static pod. perhaps an external kube-proxy operator can manage kube-proxy running as a static pod and also manage it's upgrade properly.

the k-sigs/cluster-addons had some ideas around a kube-proxy operator and we do like to move kube-proxy management outside of kubeadm eventually.

but also it seems the kubelet/kube-proxy skew does not matter for the time being: https://groups.google.com/g/kubernetes-sig-architecture/c/QX-4qq2krm0/m/bJwosPI_AAAJ

pacoxu commented 3 years ago

Agree, an external kube-proxy operator would be a better option.

neolit123 commented 3 years ago

i'm going to go ahead and close this after the discussion here.

once we support the kubeadm upgrade apply phases we can break down the upgrade process of doing CP upgrade first and then doing something like upgrade apply phase kube-proxy potentially. https://github.com/kubernetes/kubeadm/issues/1318
the operator can handle this better that the step-by-step docs. https://github.com/kubernetes/kubeadm/issues/2318 https://github.com/kubernetes/kubeadm/issues/1698

if kube-apiserver / kube-proxy start having more drift in features the skew can become a problem here...but the system should fix itself after some time. what should be a concern is potential downtime while unsupported skew is in effect during upgrade. let's log a separate issue if we see cases of that.

neolit123 commented 3 years ago

@pacoxu

How about creating a job to do the upgrade of kube-proxy? The job will wait for all apiserver versions to the newer one, and then upgrade kube-proxy.

that is not a bad idea. are you willing to draft this is a small proposal in a MD / Doc file so that we can review it?

we need to investigate the corner cases around it and how to detect the apiserver versions.

to me this feels safer:

once we support the kubeadm upgrade apply phases we can break down the upgrade process of doing CP upgrade first and then doing something like upgrade apply phase kube-proxy potentially.

pacoxu commented 3 years ago

Is this target of 1.22？

Add to my TODO list then. I'd like to write it next week.

neolit123 commented 3 years ago

given we haven't seen actual problems due to the proxy / server skew during upgrade it's not urgent. you can write your proposal later if you'd like.

ok, i'm going to re-open this and tag with 1.22 milestone.

pacoxu commented 3 years ago

@neolit123 I write an initial proposal on https://docs.google.com/document/d/1tjAp9QRJgwoyheMvgLjUHbD7MYTdDP2jSOlf6bAtuZM/edit?usp=sharing. It can be edited.

neolit123 commented 3 years ago

thank you @pacoxu !

i've added some comments and i like the idea. i think my biggest questions are around whether the Job needs a new container image and how we can avoid that.

cc @fabriziopandini @SataQiu please take a look.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 1 year ago

@pacoxu @SataQiu might be worth adding the docs (kubeadm init FG page) and e2e test sooner to check for signal and not forget the docs update.

note docs should be k/wesbite's dev-1.28 branch.

SataQiu commented 1 year ago

might be worth adding the docs (kubeadm init FG page) and e2e test sooner to check for signal and not forget the docs update.

Docs are needed. But this deprecated FG already takes effect directly in our current e2e tests. And it is planed to be removed in the next release(v1.29?), do we need to add some e2e tests to verify the old behavior(UpgradeAddonsBeforeControlPlane) ?

pacoxu commented 1 year ago

do we need to add some e2e tests to verify the old behavior(UpgradeAddonsBeforeControlPlane) ?

As we fix it as a bug, the bug behavior may not need e2e test in my opinion.

Docs are needed.

+1, BTW, as I know, we have never written a blog to introduce about kubeadm feature gates. Not sure if we can write one.

neolit123 commented 1 year ago

do we need to add some e2e tests to verify the old behavior(UpgradeAddonsBeforeControlPlane) ?

i think yes. ideally all users should just move to use the standard code path with the FG on, but for the rest of the users we should have signal that the old code path works.

we can clean this test in 1.29.

SataQiu commented 1 year ago

@neolit123 When I tried to add the e2e tests, I found that kubeadm disallowed the deprecated feature gate: https://github.com/kubernetes/kubernetes/blob/ef00ff21484336ec5207cee0c107acbe737b13af/cmd/kubeadm/app/features/features.go#L150-L152

                if featureSpec.PreRelease == featuregate.Deprecated {
                        return nil, errors.Errorf("feature-gate key is deprecated: %s", k)
                }

https://github.com/kubernetes/kubernetes/blob/ef00ff21484336ec5207cee0c107acbe737b13af/cmd/kubeadm/app/cmd/upgrade/common.go#L203-L209

        // Check if feature gate flags used in the cluster are consistent with the set of features currently supported by kubeadm
        if msg := features.CheckDeprecatedFlags(&features.InitFeatureGates, cfg.FeatureGates); len(msg) > 0 {
                for _, m := range msg {
                        printer.Printf(null, m)
                }
                return nil, nil, nil, errors.New("[upgrade/config] FATAL. Unable to upgrade a cluster using deprecated feature-gate flags. Please see the release notes")
        }

I haven't found this limitation in any code other than kubeadm. I'd like to remove this limit. What do you think? cc @pacoxu

neolit123 commented 1 year ago

maybe it can klog.Warning in those codepaths. the better option IMO would be to just remove this and klog.Warning on demand where needed.

+1

neolit123 commented 1 year ago

how do core components signal to the user an FG is deprecated; is it docs only?

if core (e.g. kubelet) prints warnings we probably should too.

pacoxu commented 1 year ago

https://github.com/kubernetes/kubernetes/blob/d8e9a7b33a25e97b0880558fc94318a5d2fb3664/staging/src/k8s.io/component-base/featuregate/feature_gate.go#L238-L242

        if featureSpec.PreRelease == Deprecated {
            klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v)
        } else if featureSpec.PreRelease == GA {
            klog.Warningf("Setting GA feature gate %s=%t. It will be removed in a future release.", k, v)
        }

+1 for a warning only.

BTW, there is no warning message for the GAed FG in kubeadm, and do we also need to add it like above?

pacoxu commented 1 year ago

Even we fix it as a bugfix with a deprecated FG UpgradeAddonsBeforeControlPlane, we may keep this FG for another release cycle before removing it. I prefer to remove it in v1.31 or later.

@SataQiu @neolit123 Do you think so? Then we can change the milestone to NEXT or a specific one.

Or we can just remove it in v1.29.

SataQiu commented 1 year ago

I'm also +1 to keep this FG for some release cycles before removing it. We need more user verification and feedback that it's working properly.

neolit123 commented 1 year ago

Or we can just remove it in v1.29.

i think we all agree that 1.29 is too soon. >= 1.30 seems safer

pacoxu commented 11 months ago

With https://github.com/kubernetes/enhancements/issues/4330 WIP, we may have a better solution for this scenario using the compatibility version flag.

neolit123 commented 11 months ago

With kubernetes/enhancements#4330 WIP, we may have a better solution for this scenario using the compatibility version flag.

unclear to me how we can leverage the new flags from that KEP yet.

pacoxu commented 11 months ago

You are right. This solution is still needed with the kep.

If the upgrade path changed to something like upgrading every control plane components to new version with '--compatibility-version=perious-version' and then after every control plane upgraded, then we change compatibility-version to the new one. As the kube-proxy skew policy is not changed, it should run with n or n+1 apiserver, we should upgrade it after all control plane are upgraded.

neolit123 commented 11 months ago

You are right. This solution is still needed with the kep.

If the upgrade path changed to something like upgrading every control plane components to new version with '--compatibility-version=perious-version' and then after every control plane upgraded, then we change compatibility-version to the new one. As the kube-proxy skew policy is not changed, it should run with n or n+1 apiserver, we should upgrade it after all control plane are upgraded.

yeah, but kubeadm has no way to automate the flag --compatibility-version change during and after upgrade, which means users will have to trigger it manually somehow. and to do that we need to add more kubeadm upgrade commands (a new phase maybe, but also "upgrade apply" doesn't have phases yet).

i think we much rather just solve the skew problem with lkube-proxy as proposed in this ticket.

neolit123 commented 9 months ago

https://github.com/kubernetes/kubeadm/issues/2346#issuecomment-1697029022

[ ] remove the deprecated featuregate - perhaps next release 1.31?

@SataQiu WDYT, should we remove it in 1.30, or 1.31? we got no feedback and the CI has been green.

it would require updating docs and e2e too: https://github.com/kubernetes/kubeadm/issues/2346

i can handle the docs for 1.30 as i have an open placeholder PR https://github.com/kubernetes/website/pull/45156

neolit123 commented 6 months ago

this has been completed.

kubernetes / kubeadm

mutable upgrades: upgrade the kube-proxy DS after all control-plane components #2346