I0316 18:33:56.607682 1 port.go:599] "Adopted previously created port which was not in status" controller="openstackcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackCluster" OpenStackCluster="self-hosted-xudrlo/self-hosted-0wwq91" namespace="self-hosted-xudrlo" name="self-hosted-0wwq91" reconcileID="16545edd-8004-4a0d-b4ef-7a73fcf69301" cluster="self-hosted-0wwq91" port index=0 portID="779b61f4-3ce4-4319-9ece-213726d71430"
and then nothing further from the cluster controller. The above log message is always associated with an update to the status, so however the reconcile ended we should have been reconciled again.
The root cause was this predicate, which explicitly filters updates which only change the status:
We should remove this predicate in both OpenStackCluster and OpenStackMachine, as it results in a race when exiting the reconcile after a status update without returning an error.
/kind bug
I've seen this several times recently, normally on the upgrade test. An example:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/1951/pull-cluster-api-provider-openstack-e2e-full-test/1769057441917440000
In the logs we see:
and then nothing further from the cluster controller. The above log message is always associated with an update to the status, so however the reconcile ended we should have been reconciled again.
The root cause was this predicate, which explicitly filters updates which only change the status:
https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/c1fc60d989552246c230eeb0cde1862137280a44/controllers/openstackcluster_controller.go#L829-L838
This predicate was originally added in commit 4b368a68676c766c4f39a7eab89fd60a754c2201, which notes that it was copying a pattern from CAPA.
FWIW, CAPA also still has this predicate for both AWSMachine and AWSCluster: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/068aba046a2cae6c7a6de6c295604d5cb863d5ce/controllers/awsmachine_controller.go#L260-L277
We should remove this predicate in both OpenStackCluster and OpenStackMachine, as it results in a race when exiting the reconcile after a status update without returning an error.