kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.73k stars 710 forks source link

kubeadm upgrade apply phases #1318

Closed MarkDeckert closed 1 day ago

MarkDeckert commented 5 years ago

FEATURE REQUEST

kubeadm init recently included a "--skip-phases" option for skipping a particular phase. However, kubeadm upgrade does not appear to have this option. This specifically causes issues when doing a kubeadm upgrade on a cluster that uses kube-router to replace the functionality of kube-proxy. Since upgrade will always redeploy kube-proxy, it will immediately begin altering iptables and/or ipvs underneath kube-router, and this can cause outages if the daemonset isn't deleted quickly enough.

Additionally (please ignore my ignorance if this is the case, as this is not the main issue) I should mention the documentation is not clear whether it's even possible to skip a particular addon. It appears --skip-phases addons will skip coredns as well, which would be undesireable.

Versions

1.13.0


(EDIT by neolit123:)

SUMMARY:

proposal: https://github.com/kubernetes/enhancements/pull/1298 https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2501-kubeadm-phases-to-beta

"apply" phase naming and other ideas: https://docs.google.com/document/d/1Nicy27rt9jm9tOzZ_MEcQsnmyOFl2ZB6k110aRblv2M/edit?usp=sharing


TODO:

neolit123 commented 5 years ago

kubeadm init recently included a "--skip-phases" option for skipping a particular phase. However, kubeadm upgrade does not appear to have this option. This specifically causes issues when doing a kubeadm upgrade on a cluster that uses kube-router to replace the functionality of kube-proxy. Since upgrade will always redeploy kube-proxy, it will immediately begin altering iptables and/or ipvs underneath kube-router, and this can cause outages if the daemonset isn't deleted quickly enough.

yes, there is currently now way to skip phases during upgrade. kube-proxy is considered an "essential addon" and kubeadm assumes that all clusters would use it. also the daemon set is created in memory and passed to the client directly. it will not be written to disk.

this feature needs discussion.

/assign @timothysc @fabriziopandini

Additionally (please ignore my ignorance if this is the case, as this is not the main issue) I should mention the documentation is not clear whether it's even possible to skip a particular addon. It appears --skip-phases addons will skip coredns as well, which would be undesireable.

you can skip both and then apply the dns addon manually.

MarkDeckert commented 5 years ago

Thanks for taking a look. Also, I think if I had a choice, phase control during an upgrade would be handled through the config file/kubeadm-config configmap so as to be something inherent to the existing cluster, rather than a flag that has to be remembered at upgrade time.

fabriziopandini commented 5 years ago

@MarkDeckert you can skip only the proxy addon by using --skip-phases addons/proxy

First considerations on this topic, that deserve discussion at sig-level and commitment for execution:

rmb938 commented 5 years ago

Having the ability to skip kube-proxy re-installing during an upgrade is required. I am using kube-router to manage my services and network policies. When running a kubeadm upgrade once kube-proxy is reinstalled the cluster is unusable until a kube-proxy --cleanup is ran. This leads to downtime of services running on the cluster.

timothysc commented 5 years ago

Let's discuss during the next call.
I do think we should allow folks to skip but the SLO's from kubeadm side are gone then b/c of the permutations.

bincyber commented 5 years ago

There is a hackish workaround for kube-router users:

Create a kube-proxy daemonset and use nodeAffinity to ensure it cannot be scheduled on any node:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
 name: kube-proxy
 namespace: kube-system
 labels:
   k8s-app: kube-proxy
   purpose: upgrade-placeholder
spec:
 minReadySeconds: 10
 revisionHistoryLimit: 1
 updateStrategy:
   type: RollingUpdate
 selector:
   matchLabels:
     k8s-app: kube-proxy
     purpose: upgrade-placeholder
 template:
   metadata:
     labels:
       k8s-app: kube-proxy
       purpose: upgrade-placeholder
   spec:
     affinity:
       nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: kube-proxy
               operator: In
               values:
               - CantSchedule
     containers:
     - name: kube-proxy
       image: busybox:1.30
       command:
       - "/bin/sleep"
       - "300"
       resources:
         requests:
           cpu: 10m
           memory: 10Mi

During the upgrade of the first Kubernetes master node, it will fail on the final post-upgrade task in which the kube-proxy daemonset is updated with this error: [upgrade/postupgrade] FATAL post-upgrade error: unable to update daemonset: DaemonSet.apps "kube-proxy" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"k8s-app":"kube-proxy"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

If you are using automation to perform the upgrade, it will need to cope with this error.

Upgrading additional Kubernetes master nodes with kubeadm upgrade node experimental-control-plane requires that the kube-proxy ConfigMap exist to avoid errors:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
  labels:
    app: kube-proxy
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    kind: KubeProxyConfiguration
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 5
    clusterCIDR: 10.0.0.0/16
    configSyncPeriod: 15m0s
    conntrack:
      max: null
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      syncPeriod: 30s
    metricsBindAddress: 127.0.0.1:10249
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: -999
    portRange: ""
    resourceContainer: /kube-proxy
    udpIdleTimeout: 250ms
  kubeconfig.conf: |-
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: "https://URL-of-KUBEMASTERS"
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
yagonobre commented 5 years ago

We should promote upgrade to phase to allow the --skip-phases flag.

neolit123 commented 5 years ago

it's a matter of bandwidth for the release cycle and who can work on it. currently we are fixing some aspects of upgrades related to certs renewal, so it might be a better idea to leave it for the next cycle so that we don't have conflicting PRs. cc @fabriziopandini for an opinion.

yagonobre commented 5 years ago

I'll work on reset phases this week, probably I'll be able to work on reset phase next week.

timothysc commented 5 years ago

Current plan is to re-evaluate execution of this item in the 1.16 timeframe.

yvespp commented 5 years ago

Maybe another use case for upgrade phases with immutable infrastructure to consider: https://github.com/kubernetes/kubeadm/issues/1511#issuecomment-486103732

fabriziopandini commented 5 years ago

Rif https://github.com/kubernetes/kubeadm/issues/1658, we should ensure that kubeadm-upgrade phases should allow atomic upgrade of etcd

neolit123 commented 5 years ago

given the kubeadm upgrade apply phases probably need a KEP and given the deadline for KEPs for 1.16 has passed, punting to 1.17.

rosti commented 5 years ago

We need to reevaluate for 1.17 and get things going here. The proper fix of #1756 depends on this. /assign

rosti commented 4 years ago

/assign @Klaven

Klaven commented 4 years ago

@neolit123 @rosti here is the initial document i'm working on. will continue to work on it. I will at some point try to draw a dependency graph out for each phase.

https://docs.google.com/document/d/1Nicy27rt9jm9tOzZ_MEcQsnmyOFl2ZB6k110aRblv2M/edit?usp=sharing

please feel free to update the document as you need/want.

neolit123 commented 4 years ago

thanks @Klaven i will have a look. we should probably comment on the doc and get agreement / voting on the phase separation during a kubeadm office hours.

neolit123 commented 4 years ago

shifted the milestone to 1.18.

Klaven commented 4 years ago

yes. sorry this did not make 1.17, was not a lot of time. sorry. I have started work on it though and should see PR's for it soon.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 4 years ago

/remove-lifecycle stale

asteven commented 4 years ago

How comes a simple issue like this can not be solved for more then like 2 years? The proposals in this doc make sense, not?

kube-proxy is a addon. If I don't want it I should be able to specify that in a config file. Problem solved. Why is it so complicated to get this done?

neolit123 commented 4 years ago

it was scheduled for 1.18, but i assume @Klaven who signed up to help with this ran out of bandwidth.

the phase implementation itself is fairly straight forward, unfortunately the upgrade apply command has a lot of technical depth that had to be resolved first.

kube-proxy is a addon. If I don't want it I should be able to specify that in a config file. Problem solved. Why is it so complicated to get this done?

the optional kube-proxy installation cannot be part of the kubeadm config as long-term the kubeadm config should not know about the kube-proxy addon. this forced us to delegate the optional upgrade of kube-proxy to phases.

rmb938 commented 4 years ago

@asteven If you need a simple fix I put the following into my kubeadm config

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
clientConnection:
  kubeconfig: invalid-kubeconfig.conf

This makes kube-proxy crash and not actually mess with anything. So a simple delete of the daemonset is all that is needed after an upgrade.

It's not pretty but it's worked for me.

asteven commented 4 years ago

it was scheduled for 1.18, but i assume @Klaven who signed up to help with this ran out of bandwidth.

the phase implementation itself is fairly straight forward, unfortunately the upgrade apply command has a lot of technical depth that had to be resolved first.

kube-proxy is a addon. If I don't want it I should be able to specify that in a config file. Problem solved. Why is it so complicated to get this done?

the optional kube-proxy installation cannot be part of the kubeadm config as long-term the kubeadm config should not know about the kube-proxy addon. this forced us to delegate the optional upgrade of kube-proxy to phases.

Not sure why kubeadm should not be allowed to know about kube-proxy. If it installed it during kubeadm init, then kubeadm upgrade should be able to take what init did and use that as the base to decide what 'upgrade' means. Not? Either this or kubeadm upgrade has to inspect what's there and only upgrade that. Anything else seems to be fundamentally broken.

Isn't this the same pattern that most things in kubernetes work with? Why not kubeadm?

asteven commented 4 years ago

@asteven If you need a simple fix I put the following into my kubeadm config

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
clientConnection:
  kubeconfig: invalid-kubeconfig.conf

This makes kube-proxy crash and not actually mess with anything. So a simple delete of the daemonset is all that is needed after an upgrade.

It's not pretty but it's worked for me.

Thanks, will try this.

neolit123 commented 4 years ago

Not sure why kubeadm should not be allowed to know about kube-proxy. If it installed it during kubeadm init, then kubeadm upgrade should be able to take what init did and use that as the base to decide what 'upgrade' means. Not?

with some of the changes around CNI implementing their own proxy, kube-proxy one day might not be deployed by default by kubeadm. it is not desired to add a related field for that as part of the API if the API has to later change.

for phases we have more flexibility.

Either this or kubeadm upgrade has to inspect what's there and only upgrade that. Anything else seems to be fundamentally broken.

this was proposed, but rejected as the upgrade apply phase solution would have explicitly allowed the user to skip the re-deployment of kube-proxy on upgrade.

asteven commented 4 years ago

@neolit123 Thanks for taking the time to answer. I think now I understand why it's taking so long ;-) I have my workaround, moving on with that.

neolit123 commented 4 years ago

this was proposed, but rejected as the upgrade apply phase solution would have explicitly allowed the user to skip the re-deployment of kube-proxy on upgrade.

PR for that: https://github.com/kubernetes/kubernetes/pull/89593

because we don't know when the phases will come around, this will be added to 1.19.0, but cannot be backported..

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fabriziopandini commented 3 years ago

/remove-lifecycle stale

supra08 commented 3 years ago

Hi @neolit123, I would like to try out this issue. Reading through the comments, the final goals and deliverables for this are quite evident. I would like to ask about the current status of its implementation. I mean, is there some existing work that I need to follow up or it still needs to be worked on from ground zero?

I've also applied to LFX mentorship on this project and hope to get selected. In that, do we need a POC roughly based on the design doc or we can jump on the development directly?

neolit123 commented 3 years ago

hi @supra08 i have added more details under SUMMARY: in the issue description here.

supra08 commented 3 years ago

Thanks, that clarifies!

alex-vmw commented 3 years ago

In case anyone needs a workaround for skipping CoreDNS installation during k8s upgrade, here is what has worked for me:

  1. Make sure that your non-kubeadm deployed CoreDNS does NOT use a default configmap named coredns.
  2. Delete the configmap named coredns prior to running k8s upgrade.
  3. Ignore CoreDNS preflight warnings during k8s upgrade by using --ignore-preflight-errors=CoreDNSUnsupportedPlugins,CoreDNSMigration command line flag.
rajansandeep commented 3 years ago

/assign

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pacoxu commented 2 years ago

/remove-lifecycle stale Adding --skip-phases for kubeadm upgrade apply would be a feature request.

pacoxu commented 2 years ago

/assign

lvii commented 2 years ago

kubeadm upgrade apply unsupport --skip-phases option, when upgraded the installing without kube-proxy

# ks get nodes
NAME              STATUS   ROLES                  AGE   VERSION
deb-k8s-main-01   Ready    control-plane,master   30d   v1.23.6
deb-k8s-node-01   Ready    <none>                 27d   v1.23.5
deb-k8s-node-02   Ready    <none>                 27d   v1.23.5

# kubeadm upgrade apply v1.23.6 --skip-phases=addon/kube-proxy --v=5
unknown flag: --skip-phases
To see the stack trace of this error execute with --v=5 or higher
MarkusTeufelberger commented 2 years ago

The only thing I'm aware that's implemented from this ticket is that you can delete the coredns or kube-proxy configmap(s) and kubeadm will only complain a bit during upgrade, but not re-create them any more since about half a dozen versions. Making it intentional/explicit to kubeadm ("Hey, just skip this, I know what I'm doing!") is still not implemented.

chenk008 commented 1 year ago

@pacoxu I was just wondering if you continue to improve this PR https://github.com/kubernetes/kubernetes/pull/108767 ? Is there anything else that I can do to help?

pacoxu commented 1 year ago

@pacoxu I was just wondering if you continue to improve this PR kubernetes/kubernetes#108767 ? Is there anything else that I can do to help?

I may pick this up if I have time in v1.28.

bh185140 commented 3 months ago

Is this likely to go in for 1.31? Noticed the related PRs haven't had much activity for a long time. Happy to help contribute if there's any help needed.

neolit123 commented 3 months ago

Is this likely to go in for 1.31? Noticed the related PRs haven't had much activity for a long time. Happy to help contribute if there's any help needed.

as per the existing kubeadm maintainer roster bandwidth, unlikely. i think the existing PR was closed. it requires a lot of work and addition of unit and e2e tests, but if you think you can manage that please open a new PR.