k0sproject / k0smotron

k0smotron
https://docs.k0smotron.io/
Other
481 stars 45 forks source link

K0smotron fails to patch machine template after changing `k0sconfigspec` #780

Open eromanova opened 2 days ago

eromanova commented 2 days ago

Steps to reproduce

  1. Deploy k0smotron 1.1.2 (also reproduced on older k0smotron versions, ex. was also tested on 1.0.4)
  2. Deploy cluster with k0s as a bootstrap provider and azure as infrastructure provider using k0smotron and cluster API. All cluster objects are provided below.
  3. Wait for the cluster to be provisioned.
  4. Update K0sConfigSpec in the K0scontrolplane object (I updated spec.k0sConfigSpec.k0s.spec.extensions.helm.charts and bumped the version of one of the charts)

Expected result

  1. The k0s configuration is updated. No errors in the k0smotron controller

Actual result

  1. The cluster is deployed successfully but after the k0s config updates the K0smotron control plane controller is failing with: error creating machine from template: Apply failed with 1 conflict: conflict with \"cluster-api-provider-azure-manager\" using infrastructure.cluster.x-k8s.io/v1beta1: .spec.networkInterfaces
2024-10-16T11:35:58Z    ERROR   Reconciler error    {"controller": "k0scontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "K0sControlPlane", "K0sControlPlane": {"name":"ekaz-dev-cp","namespace":"default"}, "namespace": "default", "name": "ekaz-dev-cp", "reconcileID": "77c04dc2-4658-47ae-a14b-f4c7d765e978", "error": "error creating machine from template: Apply failed with 1 conflict: conflict with \"cluster-api-provider-azure-manager\" using infrastructure.cluster.x-k8s.io/v1beta1: .spec.networkInterfaces"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.5/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.5/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.5/pkg/internal/controller/controller.go:227
  1. The updated k0sconfig is not applied

Similar changes in k0sconfig work fine on the AWS provider and the k0s config update was applied. Please, let me know if any extra information is needed.

Cluster objects (all sensitive information is hidden) ``` apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureCluster metadata: annotations: clusterctl.cluster.x-k8s.io/block-move: "true" meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default sigs.k8s.io/cluster-api-provider-azure-last-applied-security-rules: '{"ekaz-dev-controlplane-nsg":{"allow_apiserver":"Allow K8s API Server","allow_ssh":"Allow SSH"}}' creationTimestamp: "2024-10-15T17:18:04Z" finalizers: - azurecluster.infrastructure.cluster.x-k8s.io generation: 6 labels: app.kubernetes.io/managed-by: Helm cluster.x-k8s.io/cluster-name: ekaz-dev helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default name: ekaz-dev namespace: default ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 blockOwnerDeletion: true controller: true kind: Cluster name: ekaz-dev uid: 4490bcf0-9955-4973-b583-1b093ff9d00e resourceVersion: "113535" uid: 116f63a7-d533-4792-a489-c815738375c8 spec: azureEnvironment: AzurePublicCloud bastionSpec: {} controlPlaneEndpoint: host: port: 6443 identityRef: kind: AzureClusterIdentity name: azure-cluster-identity namespace: default location: westus networkSpec: apiServerLB: backendPool: name: ekaz-dev-public-lb-backendPool frontendIPs: - name: ekaz-dev-public-lb-frontEnd publicIP: dnsName: name: pip-ekaz-dev-apiserver idleTimeoutInMinutes: 4 name: ekaz-dev-public-lb sku: Standard type: Public subnets: - cidrBlocks: - 10.0.0.0/16 id: /subscriptions//resourceGroups/ekaz-dev/providers/Microsoft.Network/virtualNetworks/ekaz-dev-vnet/subnets/ekaz-dev-controlplane-subnet name: ekaz-dev-controlplane-subnet natGateway: ip: name: "" name: "" role: control-plane routeTable: name: "" securityGroup: name: ekaz-dev-controlplane-nsg securityRules: - action: Allow description: Allow SSH destination: '*' destinationPorts: "22" direction: Inbound name: allow_ssh priority: 2200 protocol: Tcp source: '*' sourcePorts: '*' - action: Allow description: Allow K8s API Server destination: '*' destinationPorts: "6443" direction: Inbound name: allow_apiserver priority: 2201 protocol: Tcp source: '*' sourcePorts: '*' - cidrBlocks: - 10.1.0.0/16 id: /subscriptions//resourceGroups/ekaz-dev/providers/Microsoft.Network/virtualNetworks/ekaz-dev-vnet/subnets/ekaz-dev-node-subnet name: ekaz-dev-node-subnet natGateway: id: /subscriptions//resourceGroups/ekaz-dev/providers/Microsoft.Network/natGateways/ekaz-dev-node-natgw ip: name: pip-ekaz-dev-node-natgw name: ekaz-dev-node-natgw role: node routeTable: name: ekaz-dev-node-routetable securityGroup: name: ekaz-dev-node-nsg vnet: cidrBlocks: - 10.0.0.0/8 id: name: ekaz-dev-vnet resourceGroup: ekaz-dev tags: Name: ekaz-dev-vnet sigs.k8s.io_cluster-api-provider-azure_cluster_ekaz-dev: owned sigs.k8s.io_cluster-api-provider-azure_role: common resourceGroup: ekaz-dev subscriptionID: --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate metadata: annotations: meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default creationTimestamp: "2024-10-15T17:18:04Z" generation: 1 labels: app.kubernetes.io/managed-by: Helm helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default name: ekaz-dev-cp-mt namespace: default resourceVersion: "73239" uid: 01877a31-96e9-4697-88cf-8a3471ca6d1a spec: template: metadata: {} spec: identity: None image: marketplace: offer: capi publisher: cncf-upstream sku: ubuntu-2204-gen1 thirdPartyImage: false version: 130.3.20240717 networkInterfaces: - privateIPConfigs: 1 osDisk: cachingType: None diskSizeGB: 30 osType: Linux sshPublicKey: vmSize: Standard_A4_v2 --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate metadata: annotations: meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default creationTimestamp: "2024-10-15T17:18:04Z" generation: 1 labels: app.kubernetes.io/managed-by: Helm helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default name: ekaz-dev-worker-mt namespace: default ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster name: ekaz-dev uid: 4490bcf0-9955-4973-b583-1b093ff9d00e resourceVersion: "73248" uid: 37dca521-3bbb-4002-9d54-69acfd5998e6 spec: template: metadata: {} spec: identity: None image: marketplace: offer: capi publisher: cncf-upstream sku: ubuntu-2204-gen1 thirdPartyImage: false version: 130.3.20240717 networkInterfaces: - privateIPConfigs: 1 osDisk: cachingType: None diskSizeGB: 30 osType: Linux sshPublicKey: vmSize: Standard_A4_v2 --- apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: annotations: meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default creationTimestamp: "2024-10-15T17:18:04Z" finalizers: - cluster.cluster.x-k8s.io generation: 2 labels: app.kubernetes.io/managed-by: Helm helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default sveltos-agent: present name: ekaz-dev namespace: default resourceVersion: "113536" uid: 4490bcf0-9955-4973-b583-1b093ff9d00e spec: clusterNetwork: pods: cidrBlocks: - 10.244.0.0/16 services: cidrBlocks: - 10.96.0.0/12 controlPlaneEndpoint: host: port: 6443 controlPlaneRef: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: K0sControlPlane name: ekaz-dev-cp namespace: default infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureCluster name: ekaz-dev namespace: default --- apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: K0sControlPlane metadata: annotations: meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default creationTimestamp: "2024-10-15T17:18:05Z" generation: 3 labels: app.kubernetes.io/managed-by: Helm cluster.x-k8s.io/cluster-name: ekaz-dev helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default name: ekaz-dev-cp namespace: default ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 blockOwnerDeletion: true controller: true kind: Cluster name: ekaz-dev uid: 4490bcf0-9955-4973-b583-1b093ff9d00e resourceVersion: "80181" uid: 2959f4ef-7b80-41a1-86c7-e21b303507f6 spec: k0sConfigSpec: args: - --enable-worker - --enable-cloud-provider - --kubelet-extra-args="--cloud-provider=external" - --disable-components=konnectivity-server files: - contentFrom: secretRef: key: control-plane-azure.json name: ekaz-dev-cp-0-azure-json path: /etc/kubernetes/azure.json permissions: "0644" k0s: apiVersion: k0s.k0sproject.io/v1beta1 kind: ClusterConfig metadata: name: k0s spec: api: extraArgs: anonymous-auth: "true" extensions: helm: charts: - chartname: cloud-provider-azure/cloud-provider-azure name: cloud-provider-azure namespace: kube-system order: 1 values: | cloudControllerManager: nodeSelector: node-role.kubernetes.io/control-plane: "true" version: 1.30.4 - chartname: azuredisk-csi-driver/azuredisk-csi-driver name: azuredisk-csi-driver namespace: kube-system order: 2 values: | linux: kubelet: "/var/lib/k0s/kubelet" version: 1.30.4 repositories: - name: cloud-provider-azure url: https://raw.githubusercontent.com/kubernetes-sigs/cloud-provider-azure/master/helm/repo - name: azuredisk-csi-driver url: https://raw.githubusercontent.com/kubernetes-sigs/azuredisk-csi-driver/master/charts network: calico: mode: vxlan provider: calico useSystemHostname: false machineTemplate: infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate name: ekaz-dev-cp-mt namespace: default replicas: 1 updateStrategy: InPlace version: v1.30.4+k0s.0 --- apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: K0sWorkerConfigTemplate metadata: annotations: meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default creationTimestamp: "2024-10-15T17:18:05Z" generation: 1 labels: app.kubernetes.io/managed-by: Helm helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default name: ekaz-dev-machine-config namespace: default ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster name: ekaz-dev uid: 4490bcf0-9955-4973-b583-1b093ff9d00e resourceVersion: "73252" uid: cade797a-c36f-4588-ad42-486694d3a10d spec: template: spec: args: - --enable-cloud-provider - --kubelet-extra-args="--cloud-provider=external" files: - contentFrom: secretRef: key: worker-node-azure.json name: ekaz-dev-worker-mt-azure-json path: /etc/kubernetes/azure.json permissions: "0644" useSystemHostname: false version: v1.30.4+k0s.0 --- apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineDeployment metadata: annotations: machinedeployment.clusters.x-k8s.io/revision: "1" meta.helm.sh/release-name: ekaz-dev meta.helm.sh/release-namespace: default creationTimestamp: "2024-10-15T17:18:05Z" generation: 1 labels: app.kubernetes.io/managed-by: Helm cluster.x-k8s.io/cluster-name: ekaz-dev helm.toolkit.fluxcd.io/name: ekaz-dev helm.toolkit.fluxcd.io/namespace: default name: ekaz-dev-md namespace: default ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster name: ekaz-dev uid: 4490bcf0-9955-4973-b583-1b093ff9d00e resourceVersion: "101338" uid: d7e273c6-ec89-4375-9caf-cd60e897e81e spec: clusterName: ekaz-dev minReadySeconds: 0 progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 1 selector: matchLabels: cluster.x-k8s.io/cluster-name: ekaz-dev strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: labels: cluster.x-k8s.io/cluster-name: ekaz-dev spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: K0sWorkerConfigTemplate name: ekaz-dev-machine-config clusterName: ekaz-dev infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate name: ekaz-dev-worker-mt version: v1.30.4 ```

The Azure controller already populated some defaults and configuration fields in the objects above. The initial cluster objects that can be used for testing are below:

Initial cluster objects (all sensitive information is hidden) ``` --- # Source: azure-standalone-cp/templates/azurecluster.yaml apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureCluster metadata: name: ekaz-dev spec: identityRef: kind: AzureClusterIdentity name: azure-cluster-identity namespace: default location: westus subscriptionID: ${AZURE_SUBSCRIPTION_ID} --- # Source: azure-standalone-cp/templates/azuremachinetemplate-controlplane.yaml apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate metadata: name: ekaz-dev-cp-mt spec: template: spec: osDisk: diskSizeGB: 30 osType: Linux sshPublicKey: vmSize: Standard_A4_v2 image: marketplace: offer: capi publisher: cncf-upstream sku: ubuntu-2204-gen1 version: 130.3.20240717 --- # Source: azure-standalone-cp/templates/azuremachinetemplate-worker.yaml apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate metadata: name: ekaz-dev-worker-mt spec: template: spec: osDisk: diskSizeGB: 30 osType: Linux sshPublicKey: vmSize: Standard_A4_v2 image: marketplace: offer: capi publisher: cncf-upstream sku: ubuntu-2204-gen1 version: 130.3.20240717 --- # Source: azure-standalone-cp/templates/cluster.yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: ekaz-dev spec: clusterNetwork: pods: cidrBlocks: - 10.244.0.0/16 services: cidrBlocks: - 10.96.0.0/12 controlPlaneRef: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: K0sControlPlane name: ekaz-dev-cp infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureCluster name: ekaz-dev --- # Source: azure-standalone-cp/templates/k0scontrolplane.yaml apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: K0sControlPlane metadata: name: ekaz-dev-cp spec: replicas: 1 version: v1.30.4+k0s.0 k0sConfigSpec: args: - --enable-worker - --enable-cloud-provider - --kubelet-extra-args="--cloud-provider=external" - --disable-components=konnectivity-server files: - path: "/etc/kubernetes/azure.json" permissions: "0644" contentFrom: secretRef: key: control-plane-azure.json name: ekaz-dev-cp-0-azure-json k0s: apiVersion: k0s.k0sproject.io/v1beta1 kind: ClusterConfig metadata: name: k0s spec: api: extraArgs: anonymous-auth: "true" network: provider: calico calico: mode: vxlan extensions: helm: repositories: - name: cloud-provider-azure url: https://raw.githubusercontent.com/kubernetes-sigs/cloud-provider-azure/master/helm/repo - name: azuredisk-csi-driver url: https://raw.githubusercontent.com/kubernetes-sigs/azuredisk-csi-driver/master/charts charts: - name: cloud-provider-azure namespace: kube-system chartname: cloud-provider-azure/cloud-provider-azure version: 1.30.4 order: 1 values: | cloudControllerManager: nodeSelector: node-role.kubernetes.io/control-plane: "true" - name: azuredisk-csi-driver namespace: kube-system chartname: azuredisk-csi-driver/azuredisk-csi-driver version: 1.30.3 order: 2 values: | linux: kubelet: "/var/lib/k0s/kubelet" machineTemplate: infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate name: ekaz-dev-cp-mt namespace: default --- # Source: azure-standalone-cp/templates/k0sworkerconfigtemplate.yaml apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: K0sWorkerConfigTemplate metadata: name: ekaz-dev-machine-config spec: template: spec: version: v1.30.4+k0s.0 args: - --enable-cloud-provider - --kubelet-extra-args="--cloud-provider=external" files: - path: "/etc/kubernetes/azure.json" permissions: "0644" contentFrom: secretRef: key: worker-node-azure.json name: ekaz-dev-worker-mt-azure-json --- # Source: azure-standalone-cp/templates/machinedeployment.yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineDeployment metadata: name: ekaz-dev-md spec: clusterName: ekaz-dev replicas: 1 selector: matchLabels: cluster.x-k8s.io/cluster-name: ekaz-dev template: metadata: labels: cluster.x-k8s.io/cluster-name: ekaz-dev spec: version: v1.30.4 clusterName: ekaz-dev bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: K0sWorkerConfigTemplate name: ekaz-dev-machine-config infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachineTemplate name: ekaz-dev-worker-mt ```
makhov commented 1 day ago

@eromanova could you also provide the existing AzureMachine, that k0smotron is trying to update?

eromanova commented 1 day ago

Hey @makhov. Here is the AzureMachine object (my fault, didn't provide it initially, I thought AzureMachineTemplate is enough). The initial cluster was removed, but I've redeployed it again with the same configuration.

AzureMachine (all sensitive information is hidden) ``` apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AzureMachine metadata: annotations: cluster-api-provider-azure: "true" cluster.x-k8s.io/cloned-from-groupkind: AzureMachineTemplate.infrastructure.cluster.x-k8s.io cluster.x-k8s.io/cloned-from-name: ekaz-dev-az-cp-mt clusterctl.cluster.x-k8s.io/block-move: "true" meta.helm.sh/release-name: ekaz-dev-az meta.helm.sh/release-namespace: default sigs.k8s.io/cluster-api-provider-azure-last-applied-tags-vm: '{"kubernetes.io_cluster_ekaz-dev-az":"owned"}' creationTimestamp: "2024-10-17T12:49:36Z" finalizers: - azuremachine.infrastructure.cluster.x-k8s.io generation: 3 labels: cluster.x-k8s.io/cluster-name: ekaz-dev-az cluster.x-k8s.io/control-plane: "" cluster.x-k8s.io/control-plane-name: ekaz-dev-az-cp name: ekaz-dev-az-cp-0 namespace: default ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 blockOwnerDeletion: true controller: true kind: Machine name: ekaz-dev-az-cp-0 uid: 67943352-0420-44df-8e57-0c5467921052 resourceVersion: "125001" uid: 7f832ec4-fe81-4b31-80c3-fb2540601b92 spec: diagnostics: boot: storageAccountType: Managed identity: None image: marketplace: offer: capi publisher: cncf-upstream sku: ubuntu-2204-gen1 thirdPartyImage: false version: 130.3.20240717 networkInterfaces: - privateIPConfigs: 1 subnetName: ekaz-dev-az-controlplane-subnet osDisk: cachingType: None diskSizeGB: 30 osType: Linux providerID: azure:///subscriptions//resourceGroups/ekaz-dev-az/providers/Microsoft.Compute/virtualMachines/ekaz-dev-az-cp-0 sshPublicKey: vmSize: Standard_A4_v2 status: addresses: - address: ekaz-dev-az-cp-0 type: InternalDNS - address: 10.0.0.4 type: InternalIP conditions: - lastTransitionTime: "2024-10-17T12:58:46Z" status: "True" type: Ready - lastTransitionTime: "2024-10-17T12:57:21Z" status: "True" type: AvailabilitySetReady - lastTransitionTime: "2024-10-17T12:58:46Z" status: "True" type: BootstrapSucceeded - lastTransitionTime: "2024-10-17T12:58:15Z" status: "True" type: DisksReady - lastTransitionTime: "2024-10-17T12:56:57Z" status: "True" type: InboundNATRulesReady - lastTransitionTime: "2024-10-17T12:57:28Z" status: "True" type: NetworkInterfacesReady - lastTransitionTime: "2024-10-17T12:58:15Z" status: "True" type: VMRunning ready: true vmState: Succeeded ```