after loadBalancerService client init failed,we should give failed message to openstackcluster status

kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack

https://cluster-api-openstack.sigs.k8s.io/

Apache License 2.0

289 stars 253 forks source link

after loadBalancerService client init failed,we should give failed message to openstackcluster status #1950

Open Goend opened 6 months ago

Goend commented 6 months ago

/kind bug

What steps did you take and what happened: such as failed to create load balancer service client: No suitable endpoint could be found in the service catalog.

What did you expect to happen: openstack cluster should give an error when this error cannot be automatically recovered.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): latest
Cluster-API version: latest
OpenStack version:
Minikube/KIND version:
Kubernetes version (use kubectl version): 1.20.14
OS (e.g. from /etc/os-release):

Goend commented 6 months ago

code is https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/4ab8b3a34ebc54036f62ff5fdaf1dc39c2fa33ba/controllers/openstackcluster_controller.go#L726-L729 maybe we shoud add some code like this

handleUpdateOSCError(openStackCluster, errors.Errorf("failed to reconcile load balancer: %v", err))

dulek commented 6 months ago

Makes sense to me. @Goend, will you propose a PR fixing this?

Goend commented 6 months ago

But first, we need to confirm that this error is terminal Failure. Under this condition, I can submit a PR. @dulek
Therefore, we first need to find someone to confirm whether this issue is a bug. We may need more input from the community。

dulek commented 6 months ago

But first, we need to confirm that this error is terminal Failure. Under this condition, I can submit a PR. @dulek Therefore, we first need to find someone to confirm whether this issue is a bug. We may need more input from the community。

Alright, let's try to analyze this here. The problem is that OpenStackCluster enabled a load balancer, but the cloud doesn't have an Octavia endpoint, so it's impossible to fulfill this obligation. We could silently ignore that and just go on without creating the LB, but that would mean we're implicitly ignoring user's request. That's not really something I'd do, it's better to explicitly tell user that something doesn't work.

Given this assumptions - this feels like a pretty terminal failure, unless we'd like to wait until the cloud is updated with Octavia installation. Getting Octavia installed doesn't exactly sound like something that happens over cluster installation timeout, so I'd say it's terminal.

@mdbooth?

Goend commented 6 months ago

fine,I will propose a PR to fix it

Goend commented 6 months ago

@dulek Can you help me review this PR? Thank you.

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

EmilienM commented 3 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale