kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.57k stars 1.31k forks source link

Upgrade is not updating KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion #11344

Open gandhisagar opened 4 days ago

gandhisagar commented 4 days ago

What steps did you take and what happened?

We are doing upgrade of kubernetes cluster deployed using cluster-api (capv- on vsphere infrastructure).

As part of upgrade , we are applying the following changes:

Applying clusterctl upgrade plan Changing pre-post kubeadm commands Changing spec.version (e.g. from 1.29.3 to 1.30.4) Cluster is getting upgraded successfully. we can see all nodes are at 1.30.4. but KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion is not getting updated automatically.

KubeadmControlplane instance

spec:
    kubeadmConfigSpec:
      clusterConfiguration:
        apiServer:
          extraArgs:
            cloud-provider: external
        controllerManager:
          extraArgs:
            cloud-provider: external
        dns: {}
        etcd:
          local:
            extraArgs:
              election-timeout: "2500"
              heartbeat-interval: "500"
        **kubernetesVersion: v1.29.3**
        networking: {}
        scheduler: {}
      files:
      - content: |
          apiVersion: v1
          kind: Pod
          metadata:
            creationTimestamp: null
            name: kube-vip
            namespace: kube-system
          spec:
            containers:
              - args:
                  - manager
                env:
                  - name: vip_arp
                    value: "true"
                  - name: port
                    value: "6443"
                  - name: vip_interface
                    value: ""
                  - name: vip_cidr
                    value: "32"
                  - name: cp_enable
                    value: "true"
                  - name: cp_namespace
                    value: kube-system
                  - name: vip_ddns
                    value: "false"
                  - name: svc_enable
                    value: "false"
                  - name: svc_leasename
                    value: plndr-svcs-lock
                  - name: svc_election
                    value: "true"
                  - name: vip_leaderelection
                    value: "true"
                  - name: vip_leasename
                    value: plndr-cp-lock
                  - name: vip_leaseduration
                    value: "15"
                  - name: vip_renewdeadline
                    value: "10"
                  - name: vip_retryperiod
                    value: "2"
                  - name: address
                    value: 192.168.1.3
                  - name: prometheus_server
                    value: :2112
                image: sspi-test.broadcom.com/registry/kube-vip/kube-vip:v0.6.4
                imagePullPolicy: IfNotPresent
                name: kube-vip
                resources: {}
                securityContext:
                  capabilities:
                    add:
                      - NET_ADMIN
                      - NET_RAW
                volumeMounts:
                  - mountPath: /etc/kubernetes/admin.conf
                    name: kubeconfig
                  - mountPath: /etc/hosts
                    name: etchosts
            hostNetwork: true
            volumes:
              - hostPath:
                  path: /etc/kubernetes/admin.conf
                name: kubeconfig
              - hostPath:
                  path: /etc/kube-vip.hosts
                  type: File
                name: etchosts
          status: {}
        owner: root:root
        path: /etc/kubernetes/manifests/kube-vip.yaml
        permissions: "0644"
      - content: 127.0.0.1 localhost kubernetes
        owner: root:root
        path: /etc/kube-vip.hosts
        permissions: "0644"
      - content: |
         <removed>
        owner: root:root
        path: /etc/pre-kubeadm-commands/50-kube-vip-prepare.sh
        permissions: "0700"
      format: cloud-config
      initConfiguration:
        localAPIEndpoint: {}
        nodeRegistration:
          criSocket: /var/run/crio/crio.sock
          imagePullPolicy: IfNotPresent
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ local_hostname }}'
      joinConfiguration:
        discovery: {}
        nodeRegistration:
          criSocket: /var/run/crio/crio.sock
          imagePullPolicy: IfNotPresent
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ local_hostname }}'
      postKubeadmCommands:
      - removed
      preKubeadmCommands:
      - removed
      users:
      - name: capv
        sshAuthorizedKeys:
        -  removed
    machineTemplate:
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineTemplate
        name: ssp-cluster
        namespace: ssp-cluster
      metadata: {}
    replicas: 1
    rolloutStrategy:
      rollingUpdate:
        maxSurge: 1
      type: RollingUpdate
    **version: v1.30.4**

Machine object (spec.version) is also 1.30.4

spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
name: ssp-cluster-rbjxg
namespace: ssp-cluster
uid: 2f9e1f34-c625-4b3d-a12d-1f2aa44ac084
dataSecretName: ssp-cluster-rbjxg
clusterName: ssp-cluster
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachine
name: ssp-cluster-rbjxg
namespace: ssp-cluster
uid: fdf5bbf9-4fe9-4d58-9bad-d5efbb61326f
nodeDeletionTimeout: 10s
providerID: vsphere://42263451-3edc-5138-04d7-a7ea59b9946d
version: v1.30.4

We are following this : https://cluster-api.sigs.k8s.io/tasks/upgrading-clusters#how-to-upgrade-the-kubernetes-control-plane-version

When we tried to update it manually , it fails as forbidden to update this field.

Any suggestion or any specific step in upgrade we are missing out ?

So far tried

  1. Manual Upgrade : FAILED with error: spec.kubeadmConfigSpec.clusterConfiguration.kubernetesVersion: Forbidden: cannot be modified
  2. Force-reconcile : By adding annotation to KubeadmControlPlane (cluster.x-k8s.io/force-reconcile: "true") , No luck
  3. Restarted all pods on management cluster

What did you expect to happen?

We were expecting if KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion is not modifiable then it should automatically get updated to 1.30.4 after upgrade.

Cluster API version

clusterctl version: clusterctl version: &version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3", GitCommit:"965ffa1d94230b8127245df750a99f09eab9dd97", GitTreeState:"clean", BuildDate:"2024-03-12T17:15:08Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}

bootstrap-kubeadm: v1.7.1 cert-manager: v1.14.2 cluster-api: v1.7.1 control-plane-kubeadm: v1.7.1 infrastructure-vsphere: v1.10.0 ipam-incluster: v0.1.0

Kubernetes version

1.29.3 -> 1.30.4 Upgrade

Anything else you would like to add?

root@sspi-test:/image/VMware-SSP-Installer-5.0.0.0.0.80589143/phoenix# kubectl get kubeadmcontrolplane ssp-cluster -n ssp-cluster NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION ssp-cluster ssp-cluster true true 1 1 1 0 63m v1.30.4

root@sspi-test:/image/VMware-SSP-Installer-5.0.0.0.0.80589143/phoenix# kubectl get cluster -A NAMESPACE NAME CLUSTERCLASS PHASE AGE VERSION ssp-cluster ssp-cluster Provisioned 63m

Label(s) to be applied

/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

k8s-ci-robot commented 4 days ago

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
gandhisagar commented 4 days ago

Or this field will never get updated during upgrade as kubeadm init happened during deployment and this field is only used while initializing the cluster?

neolit123 commented 3 days ago

CAPI does have this: https://github.com/kubernetes-sigs/cluster-api/blob/cca7f8c142c7bf40592cd0b9460af8747367775c/bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller.go#L1191 but it's only updated if the value is "".

Or this field will never get updated during upgrade as kubeadm init happened during deployment and this field is only used while initializing the cluster?

searching the code base, i'd say it's not updated continuously. what is your use case to track the kubeadm config version?

sbueringer commented 3 days ago

This question was also posted at least in 2 Slack channels. @gandhisagar can you please de-duplicate? It's not very efficient for folks trying to help.

gandhisagar commented 2 days ago

@neolit123 We are automating the upgrade using capi/capv in our enterprise product. In upgrade we are changing the spec.version as described in documentation but KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion remains the one which is used during installation. I am trying to understand the implication here , If it remains the old , does production will have any effect ?

gandhisagar commented 2 days ago

@sbueringer Sure, cold response there so will watch for a day and delete it from slack channel.

fabriziopandini commented 2 days ago

echoing what I answered to slack channel (and please, stop duplicating request, it doesn't help you to solve your problem and makes everyone else life more complicated)

you should not set the KubeadmConfigSpec. ClusterConfiguration. KubernetesVersion field; if you leave it empty CABPK will use the the top level version and upgrades will just work.

Note: we should probably also remove the field from the API, but we are stuck in refactoring this API by a few other ongoing discussion about

gandhisagar commented 2 days ago

@fabriziopandini This is upgrade , not deployment. This field is populated during deployment but not getting changed during upgrade. When I say upgrade it means we are upgrading the template from 1.29 to 1.30. Its in place upgrade , not blue-green. How to make it blank during upgrade ?

As you may have noticed , Already deleted the message .

sbueringer commented 2 days ago

I think the only way to unset this field on a KCP object that already has it is to disable the KCP validation webhook, unset the field and then enable the webhook again.

gandhisagar commented 2 days ago

@sbueringer Is there any procedure I can follow or do you recommend not to do that in production. We are fine keeping this old , Just trying to see if there is any impact with this field being old in production , appreciate the help.

sbueringer commented 1 day ago

Not sure what the impact is. As far as I can tell this kubernetesVersion gets passed from the KCP object to the KubeadmConfigs and from there onto Machines and then used by kubeadm. (but maybe I"m misreading our code)

I would probably try to verify which version effectively ends up in the config file used by kubeadm when creating new Nodes with kubeadm join.

If the result is that kubeadm join gets a config file with the wrong Kubernetes version, the question is what impact that has on kubeadm behavior (I have no idea).

If overall the result of that investigation is that it's problematic, the only way to do this right now is: "I think the only way to unset this field on a KCP object that already has it is to disable the KCP validation webhook, unset the field and then enable the webhook again.". We don't have any further documentation.

What we maybe could consider is allowing folks to unset the kubernetesVersion field within ClusterConfiguration (but this requires a code change). I assume today the validating webhook on KCP blocks unsetting the version? (based on what you wrote above)