kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
289 stars 253 forks source link

Errors returned by GetInstanceStatusByName should be handled in machine controller. #2085

Closed zioc closed 4 months ago

zioc commented 4 months ago

/kind bug

What steps did you take and what happened:

In a deployment that was running a capo controller with that fix, we've observed the bug that was supposed to be fixed

The controller was attempting to recreate an existing instance, whereas the openstackMachine had a valid instanceID in its spec:

I0513 08:01:02.327471       1 openstackmachine_controller.go:473] "Machine does not exist, creating Machine" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="wc-rim5/rim5-cluster-md-md0-2fd17ee6b9-qd9kl" namespace="wc-rim5" reconcileID="b5aca506-9b68-4bae-af47-3666f3de3784" openStackMachine="rim5-cluster-md-md0-2fd17ee6b9-qd9kl" machine="rim5-cluster-md0-x2q8n-85dzt" cluster="rim5-cluster" openStackCluster="rim5-cluster" name="free5g6-cluster-md-md0-7f29028a04-2khtx"

It seems to happen in specific conditions, where nova API always returns 404 errors (because of a mis-configured revese proxy by example)

In that case GetInstanceStatus will return (nil, nil) here

And the error returned by GetInstanceStatusByName will not be handled here (my bad)

Consequently controller will attempts to create a new instance... Errors returned by GetInstanceStatusByName should be checked at that point.

Environment:

zioc commented 4 months ago

I'll submit a PR to fix that issue promptly

EmilienM commented 4 months ago

I realized that too when I worked on https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2067 - and I also think we should return an error because the instance wasn't found. Thanks for working on that.

@mdbooth for a second opinion.

/assign zioc