vexxhost / magnum-cluster-api

Cluster API driver for OpenStack Magnum
Apache License 2.0
44 stars 20 forks source link

Cluster deletion stuck in DELETE_IN_PROGRESS #63

Closed okozachenko1203 closed 5 months ago

okozachenko1203 commented 1 year ago

Context

Cluster(3 masters and 1 worker) was failed to create because of the resource lack. 2 masters and 1 worker created and the 3rd master creation failed. Then deleted that cluster but it hangs up on DELETE_IN_PROGRESS status.

$ kubectl get clusters
NAMESPACE       NAME                     PHASE      AGE    VERSION
magnum-system   k8s-v1-25-3-8mic9qdzdl   Deleting   160m   v1.25.3

$ kubectl describe openstackclusters -n magnum-system
...
Events:
  Type     Reason                            Age                From                  Message
  ----     ------                            ----               ----                  -------
  Normal   Successfuldisassociatefloatingip  23m                openstack-controller  Disassociated floating IP 172.24.4.74
  Normal   Successfuldeletefloatingip        23m                openstack-controller  Deleted floating IP 172.24.4.74
  Normal   Successfuldeleteloadbalancer      23m                openstack-controller  Deleted load balancer k8s-clusterapi-cluster-magnum-system-k8s-v1-25-3-8mic9qdzdl-kubeapi with id edc6de48-71e3-4da4-b98d-15ad258ba319
  Warning  Faileddeleteloadbalancer          23m (x5 over 23m)  openstack-controller  Failed to delete load balancer k8s-clusterapi-cluster-magnum-system-k8s-v1-25-3-8mic9qdzdl-kubeapi with id edc6de48-71e3-4da4-b98d-15ad258ba319: Expected HTTP response code [202 204] when accessing [DELETE http://38.108.68.181/load-balancer/v2.0/lbaas/loadbalancers/edc6de48-71e3-4da4-b98d-15ad258ba319?cascade=true], but got 409 instead
{"faultcode": "Client", "faultstring": "Invalid state PENDING_DELETE of loadbalancer resource edc6de48-71e3-4da4-b98d-15ad258ba319", "debuginfo": null}
  Warning  Faileddeletesecuritygroup  101s (x14 over 23m)  openstack-controller  Failed to delete security group k8s-cluster-magnum-system-k8s-v1-25-3-8mic9qdzdl-secgroup-controlplane with id 56f209e7-3ed4-4880-83b0-5a52284c9e8d: Expected HTTP response code [202 204] when accessing [DELETE http://38.108.68.181:9696/networking/v2.0/security-groups/56f209e7-3ed4-4880-83b0-5a52284c9e8d], but got 409 instead
{"NeutronError": {"type": "SecurityGroupInUse", "message": "Security Group 56f209e7-3ed4-4880-83b0-5a52284c9e8d in use.", "detail": ""}}
ubuntu@magnum-capi-driver:~$ source /opt/stack/openrc admin admin
WARNING: setting legacy OS_TENANT_NAME to support cli tools.

Loadbalancer has been deleted finally but sg is not deleted because it is in-use by an undeleted port.

+--------------------------------------+----------------------------------------------------+-------------------+----------------------------------------------------------------------------------------------------+--------+
| ID                                   | Name                                               | MAC Address       | Fixed IP Addresses                                                                                 | Status |
+--------------------------------------+----------------------------------------------------+-------------------+----------------------------------------------------------------------------------------------------+--------+
| 94720b71-091a-42aa-b758-b0675feee028 | k8s-v1-25-3-8mic9qdzdl-control-plane-8xbtc-vwrjx-0 | fa:16:3e:e6:1e:99 | ip_address='10.6.0.95', subnet_id='33540296-1071-4227-8cb7-fab4d042ea5e'                           | DOWN   |
+--------------------------------------+----------------------------------------------------+-------------------+----------------------------------------------------------------------------------------------------+--------+

Only one port is not deleted and remains. I guess this is the port that was bound to 3rd master node.

Workaround

Manually delete that dangling port so sg can be deleted by CAPO.

mnaser commented 1 year ago

@okozachenko1203 I feel like this is a bug that should be reported in Cluster API.

fnpanic commented 1 year ago

Has this been reported to cluster API upstream? Can you linke the issue here so it can be tracked? @okozachenko1203

okozachenko1203 commented 1 year ago

sorry guys, i was missing this https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1596

mnaser commented 1 year ago

which is more like https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1404 :P

okozachenko1203 commented 1 year ago

fix has been released with https://github.com/kubernetes-sigs/cluster-api-provider-openstack/releases/tag/v0.8.0-beta.0

mnaser commented 5 months ago

Since we use 0.9.0, we can close this.