solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 438 forks source link

GLoo Fed: unable to delete wrong FailoverScheme when cluster name doesn't exist #7091

Open bcollard opened 2 years ago

bcollard commented 2 years ago

Gloo Edge Version

1.12.x (latest stable)

Kubernetes Version

No response

Describe the bug

Given a FailoverScheme with an unexisting clusterName (the local one):

apiVersion: fed.solo.io/v1
kind: FailoverScheme
metadata:
 name: failover-scheme
 namespace: gloo-system
spec:
 failoverGroups:
 - priorityGroup:
   - cluster: remote
     upstreams:
     - name: default-service-green-10000
       namespace: gloo-system
 primary:
   clusterName: foo
   name: default-service-blue-10000
   namespace: gloo-system

Then the gloo-fed pod will log some errors, and I'm not able to delete the FailoverScheme CR: it will not complete since the reconciler cannot find the cluster.

Bouncing pods doesn't help.

Steps to reproduce the bug

logs:

{"level":"error","ts":1662370907.1137114,"logger":"gloo-fed.controller.failover-scheme","msg":"Reconciler error","version":"1.12.9","name":"failover-scheme","namespace":"gloo-system","error":"failed to get client for cluster foo: Failed to get manager for cluster foo","errorVerbose":"failed to get client for cluster foo
  multicluster.(*mcClient).Cluster:/go/pkg/mod/github.com/solo-io/skv2@v0.21.7/pkg/multicluster/client.go:30
Failed to get manager for cluster foo
  controller.(*Controller).processNextWorkItem:/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:253
  controller.(*Controller).reconcileHandler:/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:298
  reconcile.(*runnerReconciler).Reconcile:/go/pkg/mod/github.com/solo-io/skv2@v0.21.7/pkg/reconcile/runner.go:186
  controller.genericFailoverSchemeFinalizer.Finalize:/workspace/solo-projects/projects/gloo-fed/pkg/api/fed.solo.io/v1/controller/reconcilers.go:251
  failover.(*failoverSchemeReconcilerImpl).FinalizeFailoverScheme:/workspace/solo-projects/projects/gloo-fed/pkg/routing/failover/reconciler.go:79
  failover.(*failoverProcessorImpl).ProcessFailoverDelete:/workspace/solo-projects/projects/gloo-fed/pkg/routing/failover/processor.go:292
  v1.(*multiclusterClientset).Cluster:/go/pkg/mod/github.com/solo-io/solo-apis@v0.0.0-20220824115509-3024f1ee9cd7/pkg/api/gloo.solo.io/v1/clients.go:33
  multicluster.(*mcClient).Cluster:/go/pkg/mod/github.com/solo-io/skv2@v0.21.7/pkg/multicluster/client.go:30
  multicluster.(*mcClient).Cluster:/go/pkg/mod/github.com/solo-io/skv2@v0.21.7/pkg/multicluster/client.go:28
  watch.(*clusterWatcher).Cluster:/go/pkg/mod/github.com/solo-io/skv2@v0.21.7/pkg/multicluster/watch/watcher.go:82
  watch.(*managerSet).get:/go/pkg/mod/github.com/solo-io/skv2@v0.21.7/pkg/multicluster/watch/watcher.go:157","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
  /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:214"}

Expected Behavior

Just log errors or update the FailoverScheme Status with an error message.

Additional Context

No response

github-actions[bot] commented 4 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.