rancher / aks-operator

Azure Kubernetes Service operator for Rancher
Apache License 2.0
9 stars 30 forks source link

Changing multiple fields of cluster while update is not working #667

Open cpinjani opened 1 week ago

cpinjani commented 1 week ago

Rancher version: Rancher - v2.9-f1b43d2568d7c53c3adf45d9ffd74a04ea65fc22-head aks-operator:v1.9.2-rc.2

Cluster Type: Downstream AKS cluster

Describe the bug:

Spec is updated with changes applied. Only the change to routing gets applied, and Nodepool remains intact after the update.

aksConfig:
    authBaseUrl: https://login.microsoftonline.com/
    authorizedIpRanges: null
    azureCredentialSecret: cattle-global-data:cc-j7sg7
    baseUrl: https://management.azure.com/
    clusterName: cpinjani-aks
    dnsPrefix: cpinjani-aks
    dnsServiceIp: 10.0.0.10
    dockerBridgeCidr: null
    httpApplicationRouting: true
    imported: false
    kubernetesVersion: 1.29.0
    linuxAdminUsername: azureuser
    loadBalancerSku: Standard
    logAnalyticsWorkspaceGroup: null
    logAnalyticsWorkspaceName: null
    managedIdentity: null
    monitoring: null
    networkPlugin: kubenet
    networkPolicy: null
    nodePools:
      - availabilityZones:
          - '1'
          - '2'
          - '3'
        count: 1
        maxPods: 110
        maxSurge: '1'
        mode: System
        name: np1
        orchestratorVersion: 1.29.0
        osDiskSizeGB: 128
        osDiskType: Managed
        osType: Linux
        vmSize: Standard_DS2_v2
    outboundType: loadBalancer
    podCidr: 10.244.0.0/16
    privateCluster: false
    privateDnsZone: null
    resourceGroup: cpinjani-aks
    resourceLocation: eastus
    serviceCidr: 10.0.0.0/16
    subnet: null
    tags:
      Account Type: group
    userAssignedIdentity: null
    virtualNetwork: null
    virtualNetworkResourceGroup: null

Logs:

2.9:

time="2024-09-16T11:03:50Z" level=info msg="Checking configuration for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:03:50Z" level=info msg="Updating HTTP application routing to true for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:06:27Z" level=error msg="Error recording akscc [ (id: )] failure message: resource name may not be empty"
time="2024-09-16T11:06:28Z" level=info msg="Checking configuration for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:06:28Z" level=info msg="Configuration for cluster [cpinjani-aks (id: c-jm2vn)] was verified"
time="2024-09-16T11:06:29Z" level=info msg="Checking configuration for cluster [cpinjani-aks (id: c-jm2vn)]"
time="2024-09-16T11:06:30Z" level=info msg="Configuration for cluster [cpinjani-aks (id: c-jm2vn)] was verified"

2.8:

time="2024-09-16T10:42:39Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:42:40Z" level=info msg="Updating HTTP application routing for cluster [cpinjani-aks28]"
time="2024-09-16T10:42:45Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:43:15Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:43:46Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:44:17Z" level=info msg="Waiting for cluster [c-zw6hc] to finish updating"
time="2024-09-16T10:44:48Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:44:49Z" level=info msg="Removing node pool [pool3] from cluster [cpinjani-aks28]"
time="2024-09-16T10:44:51Z" level=info msg="Waiting for cluster [c-zw6hc] to delete node pool [pool3]"
time="2024-09-16T10:45:22Z" level=info msg="Waiting for cluster [c-zw6hc] to delete node pool [pool3]"
time="2024-09-16T10:45:52Z" level=info msg="Waiting for cluster [c-zw6hc] to delete node pool [pool3]"
time="2024-09-16T10:46:23Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:46:24Z" level=info msg="Cluster [c-zw6hc] finished updating"
time="2024-09-16T10:46:25Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:46:25Z" level=info msg="Configuration for cluster [cpinjani-aks28] was verified"
time="2024-09-16T10:46:40Z" level=info msg="Checking configuration for cluster [cpinjani-aks28]"
time="2024-09-16T10:46:41Z" level=info msg="Configuration for cluster [cpinjani-aks28] was verified"
valaparthvi commented 5 days ago

On the same lines, it is not possible to add more than one nodepool to the cluster via API. One of the test cases is to update a cluster while it is still provisioning. I tested with 2 updates: 1) K8s upgrade and node pool addition - it only upgraded k8s version 2) Adding more than one node pool - it only added one nodepool

When I tested the same things via UI, it seemed to be working as expected.

AKSConfig is updated when the request is first sent, but then it seems to revert.

Logs

``` e="2024-09-23T13:52:39Z" level=info msg="Checking if cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] exists" time="2024-09-23T13:52:40Z" level=info msg="Checking if resource group [auto-aks-pvala-hp-ci-cqsni] exists" time="2024-09-23T13:52:40Z" level=info msg="Creating resource group [auto-aks-pvala-hp-ci-cqsni] for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:52:41Z" level=info msg="Resource group [auto-aks-pvala-hp-ci-cqsni] created successfully" time="2024-09-23T13:52:41Z" level=info msg="Creating AKS cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:52:49Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:53:13Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:53:19Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:53:20Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:53:50Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:54:22Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:54:54Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:55:25Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:55:56Z" level=info msg="Waiting for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] to finish creating, cluster state: Creating" time="2024-09-23T13:56:27Z" level=info msg="Cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] created successfully" time="2024-09-23T13:56:30Z" level=info msg="Checking configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:56:31Z" level=info msg="Updating tags for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:56:33Z" level=info msg="Tags were not updated for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)], config map[janitor-ignore:true owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], upstream map[Account Owner:mlatimer@suse.com Account Type:group Cost Center:211799999 Department:ecm Environment:development Finance Business Partner:geoff.guest@suse.com General Ledger Code:200000119 Stakeholder:jeff.hobbs@suse.com Team:container-es janitor-ignore:true owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], moving on" time="2024-09-23T13:56:33Z" level=info msg="Updating kubernetes version to 1.30.3 for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:59:11Z" level=error msg="Error recording akscc [ (id: )] failure message: resource name may not be empty" time="2024-09-23T13:59:12Z" level=info msg="Checking configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:59:13Z" level=info msg="Configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] was verified" time="2024-09-23T13:59:15Z" level=info msg="Checking configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T13:59:16Z" level=info msg="Configuration for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] was verified" time="2024-09-23T13:59:24Z" level=info msg="Removing cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)]" time="2024-09-23T14:01:18Z" level=info msg="Creating cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:01:18Z" level=info msg="Checking if cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] exists" time="2024-09-23T14:01:19Z" level=info msg="Checking if resource group [auto-aks-hp-ci-vgcla] exists" time="2024-09-23T14:01:19Z" level=info msg="Creating resource group [auto-aks-hp-ci-vgcla] for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:01:20Z" level=info msg="Resource group [auto-aks-hp-ci-vgcla] created successfully" time="2024-09-23T14:01:20Z" level=info msg="Creating AKS cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:01:32Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:01:44Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:02:03Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:02:34Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:02:57Z" level=info msg="Cluster auto-aks-pvala-hp-ci-cqsni removed successfully" time="2024-09-23T14:02:57Z" level=info msg="Cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] was removed successfully" time="2024-09-23T14:02:57Z" level=info msg="Resource group [auto-aks-pvala-hp-ci-cqsni] for cluster [auto-aks-pvala-hp-ci-cqsni (id: c-p98qf)] still exists, please remove it if needed" time="2024-09-23T14:03:06Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:03:37Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:04:08Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to finish creating, cluster state: Creating" time="2024-09-23T14:04:40Z" level=info msg="Cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] created successfully" time="2024-09-23T14:04:43Z" level=info msg="Checking configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:04:44Z" level=info msg="Updating tags for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:04:47Z" level=info msg="Tags were not updated for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)], config map[owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], upstream map[Account Owner:mlatimer@suse.com Account Type:group Cost Center:211799999 Department:ecm Environment:development Finance Business Partner:geoff.guest@suse.com General Ledger Code:200000119 Stakeholder:jeff.hobbs@suse.com Team:container-es owner:hosted-providers-qa-ci-pvala testfilenumber:line125_p1_provisioning_test], moving on" time="2024-09-23T14:04:47Z" level=info msg="Adding node pool [ggygl] for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:04:52Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]" time="2024-09-23T14:05:23Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]" time="2024-09-23T14:05:55Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]" time="2024-09-23T14:06:26Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]" time="2024-09-23T14:06:39Z" level=info msg="Waiting for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] to update node pool [ggygl]" time="2024-09-23T14:06:57Z" level=info msg="Checking configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:06:59Z" level=info msg="Cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] finished updating" time="2024-09-23T14:07:00Z" level=info msg="Checking configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" time="2024-09-23T14:07:01Z" level=info msg="Configuration for cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)] was verified" time="2024-09-23T14:07:50Z" level=info msg="Removing cluster [auto-aks-hp-ci-vgcla (id: c-gjz7d)]" ```

valaparthvi commented 4 days ago

This was also seen while deleting a nodepool and adding a new one. While the new nodepool remained, the deleted nodepool was re-added after a few minutes. AKSConfig maintains the desired state, until it is restored.