kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
289 stars 253 forks source link

after loadBalancerService client init failed,we should give failed message to openstackcluster status #1950

Open Goend opened 6 months ago

Goend commented 6 months ago

/kind bug

What steps did you take and what happened: such as failed to create load balancer service client: No suitable endpoint could be found in the service catalog.

What did you expect to happen: openstack cluster should give an error when this error cannot be automatically recovered.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Goend commented 6 months ago

code is https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/4ab8b3a34ebc54036f62ff5fdaf1dc39c2fa33ba/controllers/openstackcluster_controller.go#L726-L729 maybe we shoud add some code like this

handleUpdateOSCError(openStackCluster, errors.Errorf("failed to reconcile load balancer: %v", err))
dulek commented 6 months ago

Makes sense to me. @Goend, will you propose a PR fixing this?

Goend commented 6 months ago

But first, we need to confirm that this error is terminal Failure. Under this condition, I can submit a PR. @dulek
Therefore, we first need to find someone to confirm whether this issue is a bug. We may need more input from the community。

dulek commented 6 months ago

But first, we need to confirm that this error is terminal Failure. Under this condition, I can submit a PR. @dulek Therefore, we first need to find someone to confirm whether this issue is a bug. We may need more input from the community。

Alright, let's try to analyze this here. The problem is that OpenStackCluster enabled a load balancer, but the cloud doesn't have an Octavia endpoint, so it's impossible to fulfill this obligation. We could silently ignore that and just go on without creating the LB, but that would mean we're implicitly ignoring user's request. That's not really something I'd do, it's better to explicitly tell user that something doesn't work.

Given this assumptions - this feels like a pretty terminal failure, unless we'd like to wait until the cloud is updated with Octavia installation. Getting Octavia installed doesn't exactly sound like something that happens over cluster installation timeout, so I'd say it's terminal.

@mdbooth?

Goend commented 6 months ago

fine,I will propose a PR to fix it

Goend commented 6 months ago

@dulek Can you help me review this PR? Thank you.

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

EmilienM commented 3 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale