solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 437 forks source link

Gloo gateway can get stuck and stop propagating status #2498

Closed rickducott closed 4 years ago

rickducott commented 4 years ago

I encountered an issue while giving a demo where I wasn't getting my virtual service status to update. I looked at the gateway pod and saw this:

{"level":"error","ts":"2020-02-25T15:58:07.311Z","logger":"gateway.v1.event_loop.gateway.v1.event_loop.translatorSyncer","caller":"syncer/translator_syncer.go:131","msg":"err: updating dependent statuses: 1 error occurred:\n\t* failed to write status {Accepted  gateway map[*v1.Proxy.gloo-system.gateway-proxy:] nil {} [] 0} for resource petclinic: invalid resource version gloo-system.petclinic given , expected 158559\n\n","version":"1.2.18","stacktrace":"github.com/solo-io/gloo/projects/gateway/pkg/syncer.(*translatorSyncer).propagateProxyStatus.func1\n\t/workspace/gopath/src/github.com/solo-io/gloo/projects/gateway/pkg/syncer/translator_syncer.go:131"}

Once I bounced the gateway pod, the issue went away and statuses started updating again.

Version:

+-------------+--------------------+------------------------------+
|  NAMESPACE  |  DEPLOYMENT-TYPE   |          CONTAINERS          |
+-------------+--------------------+------------------------------+
| gloo-system | Gateway Enterprise | grpcserver-ui: 1.2.6         |
|             |                    | grpcserver-ee: 1.2.6         |
|             |                    | grpcserver-envoy: 1.2.6      |
|             |                    | discovery: 1.2.18            |
|             |                    | extauth-ee: 1.2.6            |
|             |                    | gateway: 1.2.18              |
|             |                    | gloo-ee-envoy-wrapper: 1.2.6 |
|             |                    | gloo-ee: 1.2.6               |
|             |                    | observability-ee: 1.2.6      |
|             |                    | rate-limit-ee: 1.2.6         |
|             |                    | redis: 5                     |
+-------------+--------------------+------------------------------+
kdorosh commented 4 years ago

I think cherrypicking this fix to our Gloo 1.2 branch of solo-kit will help https://github.com/solo-io/solo-kit/pull/344

kdorosh commented 4 years ago

Cherrypick to v0.11.x branch of solo-kit, to be picked up by 1.2.x Gloo: https://github.com/solo-io/solo-kit/pull/347