vexxhost / magnum-cluster-api

Cluster API driver for OpenStack Magnum
Apache License 2.0
44 stars 20 forks source link

bug: nodegroup delete is failing when there are more than 2 nodegroups #335

Closed okozachenko1203 closed 6 months ago

okozachenko1203 commented 6 months ago

Context

I create 2 additional nodegroups worker1 and worker2. When I delete nodegroup worker1, then it fails with the MachineDeployment with name "worker2" is defined more than once error

pykube.exceptions.HTTPError: admission webhook "validation.cluster.cluster.x-k8s.io" denied the request: Cluster.cluster.x-k8s.io "kube-9jx4a" is invalid: spec.topology.workers.machineDeployments[3].name: Invalid value: "worker2": name must be unique. MachineDeployment with name "worker2" is defined more than once

Reason

Nodegroups are reflected as an array under capi cluster's spec.topology.workers.machineDeployments as following:

    workers:
      machineDeployments:
      - class: default-worker
        name: default-worker
        replicas: 1
        ...
      - class: default-worker
        name: worker1
        replicas: 1
        ...
      - class: default-worker
        name: worker2
        replicas: 1
        ...

And when we delete a nodegroup, we do generate a new cluster object and apply. https://github.com/vexxhost/magnum-cluster-api/blob/b2f23e1eba5e7017ed5404a6056d114c8e78b672/magnum_cluster_api/driver.py#L469-L495 This will use jsonpatch on the kue-api side and it will try to add worker2 element in the array again instead of ignoring. We used to use kube_obj.apply() but now use kube_obj.update() and it could be the reason