kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
289 stars 253 forks source link

🐛 Fix port name after port creation failure #1941

Closed mdbooth closed 6 months ago

mdbooth commented 6 months ago

CreatePorts was creating a port name based on the index of the port in the current reconcile. This could be different to the absolute index of the port if ports had been partially created in a previous reconcile.

We fix this by passing all current state into CreatePorts so it can create an absolute index. This also ensures that partially created ports will be persisted on failure so we don't have to rely on adoption in the next reconcile.

Fixes: #1940

This PR includes a second commit which refactors and simplifies how we initialises Bastion resources. The motivation for it was that first commit introduces a bug which causes a panic due to an assumption about state having been initialised. The new commit makes this mistake harder to make. It's worth looking at separately.

k8s-ci-robot commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/OWNERS)~~ [mdbooth] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
netlify[bot] commented 6 months ago

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
Latest commit 801f5ef2ff819d9f34e9e561d2c8c0044a7688d6
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/65f49cc79b5aab00084bea2a
Deploy Preview https://deploy-preview-1941--kubernetes-sigs-cluster-api-openstack.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

EmilienM commented 6 months ago

/test pull-cluster-api-provider-openstack-e2e-full-test /lgtm /hold

Feel free to hold cancel when the upgrade passed and when you think it's ready to be merged. Thanks for that work Matt, sorry I missed it in the initial iteration.

mdbooth commented 6 months ago

It looks like the full test failed to clean up a couple of ports on deletion. I'm not convinced that was a flake, but running it again to get another data point:

/test pull-cluster-api-provider-openstack-e2e-full-test

Unfortunately, if this is a bug on deletion we don't have controller logs from CI because it happens after the logs are collected.

mdbooth commented 6 months ago

Oh, we do have logs. The controller paniced: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/1941/pull-cluster-api-provider-openstack-e2e-full-test/1768199842766524416/artifacts/clusters/clusterctl-upgrade-y11roz/logs/capo-system/capo-controller-manager/capo-controller-manager-66896777f9-bnz58/manager.log

mdbooth commented 6 months ago

1949 should fix the panic seen in the upgrade test. I don't understand what's triggering it, though.

mdbooth commented 6 months ago

/test pull-cluster-api-provider-openstack-e2e-full-test

mdbooth commented 6 months ago

/test pull-cluster-api-provider-openstack-e2e-full-test

mdbooth commented 6 months ago

This PR broke cluster deletion when the cluster had a bastion, and the only test which caught it was the upgrade test. This is a bit worrying.

/test pull-cluster-api-provider-openstack-e2e-full-test

mdbooth commented 6 months ago

/test pull-cluster-api-provider-openstack-e2e-full-test

mdbooth commented 6 months ago

Note that https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1951 fixes this much more comprehensively by pre-resolving everything about the port prior to creation, including its name.

That PR is based on this one, though, so we still need to merge this first.

mdbooth commented 6 months ago

/hold cancel

EmilienM commented 6 months ago

Thanks for this work, I'm glad you sorted this out. /lgtm