LBaaS: Retry, when loadbalancer resource returns 409 response code

kayrus commented 5 years ago

/kind bug

What happened:

In some rare cases LB removal can return a 409 response code. This leads to an error when further LB remove retries cannot be completed, because all LB components were already removed and there is only one bare LB left.

If listener and FIPs were already removed, then you can see the following logs:

failed to check if load balancer exists for service default/svc: error getting floating ip for port fe535f42-768d-452b-85d1-f7f17ba20a6c: failed to find object

failed to update load balancer hosts for default/svc: loadbalancer 7f1a3f45-662d-4d4f-a02c-0ed1e5fe2dbc does not contain required listener for port 443 and protocol TCP

What you expected to happen:

There could be rare cases, when LB ACTIVE status checks are not enough, and there should be a retry logic for 409, e.g. https://github.com/terraform-providers/terraform-provider-openstack/blob/b3cae569191019b129cc89abd4c3ae2870c55d5d/openstack/util.go#L72..L83

Environment:

openstack-cloud-controller-manager version: 1.16.0

/cc @rfranzke

rfranzke commented 5 years ago

I'm experiencing the problem that the load balancer Service is already marked for deletion (deletion timestamp is set), but the CCM produces the following logs:

E1106 14:05:29.834583       1 service_controller.go:255] error processing service default/my-lb-svc1 (will retry): failed to check if load balancer exists before cleanup: error getting floating ip for port 80257ba8-6da8-4afd-a94b-115d63f1ccbb: failed to find object
I1106 14:05:29.834942       1 event.go:255] Event(v1.ObjectReference{Kind:"Service", Namespace:"default", Name:"my-lb-svc1", UID:"f71dc75f-d49f-4cf6-96b5-986fe2b83a67", APIVersion:"v1", ResourceVersion:"6059", FieldPath:""}): type: 'Warning' reason: 'SyncLoadBalancerFailed' Error syncing load balancer: failed to check if load balancer exists before cleanup: error getting floating ip for port 80257ba8-6da8-4afd-a94b-115d63f1ccbb: failed to find object
E1106 14:05:30.329483       1 service_controller.go:255] error processing service default/my-lb-svc2 (will retry): failed to check if load balancer exists before cleanup: error getting floating ip for port 6d21e0c8-7b7c-4981-b856-bb32447e2ee4: failed to find object
I1106 14:05:30.329668       1 event.go:255] Event(v1.ObjectReference{Kind:"Service", Namespace:"default", Name:"my-lb-svc2", UID:"16e6d61c-76fb-40cb-b88d-98bc5fb3b43e", APIVersion:"v1", ResourceVersion:"6090", FieldPath:""}): type: 'Warning' reason: 'SyncLoadBalancerFailed' Error syncing load balancer: failed to check if load balancer exists before cleanup: error getting floating ip for port 6d21e0c8-7b7c-4981-b856-bb32447e2ee4: failed to find object

kayrus commented 4 years ago

@rfranzke @afritzler looks like the issue is solved in #797, can you check?

afritzler commented 4 years ago

@kayrus do you know if this fix is part of the 1.16 release of the OpenStack CCM?

kayrus commented 4 years ago

@afritzler no, it is in master branch. @adisky , is there a policy to release minor CCM versions with bugfixes?

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

ramineni commented 4 years ago

/remove-lifecycle stale

ramineni commented 4 years ago

This should be part of last release 1.17 /close

k8s-ci-robot commented 4 years ago

@ramineni: Closing this issue.

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/821#issuecomment-590691689): >This should be part of last release 1.17 >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / cloud-provider-openstack

LBaaS: Retry, when loadbalancer resource returns 409 response code #821