solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.07k stars 437 forks source link

Failoverscheme never gets to accepted state and can't be deleted #7309

Open huzlak opened 1 year ago

huzlak commented 1 year ago

Gloo Edge Version

1.12.x (latest stable)

Kubernetes Version

1.22.x

Describe the bug

Failoverscheme never gets to accepted state and can't even be deleted after gloo-fed pod is restarted.

Steps to reproduce the bug

  1. Install 2 clusters ready for failover as described in docs.
  2. Deploy blue service to cluster1 and green to cluster2
  3. create static upstreams for the services with healthchecks enabled.
  4. Create failoverscheme pointing to the created upstreams
  5. Get status message for the failoverscheme:
    │ status:                                                                                                                                                                                                         │
    │   message: 'Operation cannot be fulfilled on upstreams.gloo.solo.io "default-service-blue-10000":                                                                                                               │
    │     the object has been modified; please apply your changes to the latest version                                                                                                                               │
    │     and try again

    and the logs for gloo-fed pod:

    reflector.go:138] pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Upstream: failed to list *v1.Upstream: v1.UpstreamList.Items: []v1.Upstream: │
    │  v1.Upstream.v1.Upstream.Status: unmarshalerDecoder: unknown field "statuses" in gloo.solo.io.UpstreamStatus, error found in #10 byte of ...|arning"}}}},{"apiVer|..., bigger context ...|over\n\n","reportedBy │
    │ ":"gloo","state":"Warning"}}}},{"apiVersion":"gloo.solo.io/v1","kind":"Upstream|...

    Failoverscheme seems to actually work, but the error is always reported in failoverscheme's status. Gloo-fed also logs the message above. After restarting gloo-fed pod, I get a status for failoverscheme as:

    │ status:                                                                                                                                                                                                         │
    │   message: dependent has been updated

    The failoverscheme now can't even be deleted and the errors are still logged.

Expected Behavior

I expect the failoverscheme to be ACCEPTED and no errors logged in gloo-fed pods.

Additional Context

Tried in 1.12.18 as well as 1.12.28. both gets the same result. Failover itself actually works, but the errors are logged and the failoverscheme can't be deleted after gloo-fed restart

chrisgaun commented 1 year ago

Customer says it works in 1.12.15

github-actions[bot] commented 3 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.