solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.07k stars 434 forks source link

upstream status is (sometimes) never updated, preventing linked VS creation in the presence of validation #6530

Open jdef opened 2 years ago

jdef commented 2 years ago

Gloo Edge Version

1.10.x

Kubernetes Version

1.21.x

Describe the bug

Flaky behavior: we create a namespace, deployment, service, upstream, and VS - orchestrated by kapp, so that the VS is not created until the upstream reports status 1 (Accepted). This still sometimes fails because the "status" of upstreams is, sometimes, never updated.

This also seems to be visible in the logs (hand-typed, source system is air-gapped):

{"level":"info","ts":"2022-06-02T20:37:22.687Z","logger":"gloo.v1.event_loop.setup.v1.event_loop.envoyTranslator","caller":"syncer/envoy_translator_syncer.go:78","msg":"begin sync 10358675414714027320 (0 proxies, 0 upstreams, 4 endpoints, 15 secrets, 78 artifacts, 0 auth configs, 0 rate limit configs, 0 graphql schemas)","version":"1.10.17"}
{"level":"info","ts":"2022-06-02T20:37:22.687Z","logger":"gloo.v1.event_loop.setup.v1.event_loop.envoyTranslator","caller":"syncer/envoy_translator_syncer.go:184","msg":"end sync 10358675414714027320","version":"1.10.17"}

^ this pattern repeats, continuously. the upstreams DO exist, and gloo is configured to watch all namespaces. but they are never updated w/ status, and the syncer never sees them. since they don't get assigned a status, the downstream VS creation fails because the upstream is not ready.

NOTE: We're using the latest kapp release (0.48) because it lets us define custom wait rules for bespoke status APIs, and so kapp has been configured to properly wait for upstream status to be "accepted" before attempting VS creation (as per original slack thread here: https://solo-io.slack.com/archives/C9L6VPAUW/p1649174542255109)

Steps to reproduce the bug

  1. create ns, apply deployment, service, upstream, VS - in that order
  2. don't create VS until upstream has status "accepted" (1)
  3. step (2) never completes because upstream status is not updated (kapp waits for 15m, upstream still has no .status)

Expected Behavior

i expect upstream status to be updated in a reasonable timeframe after creation

Additional Context

gloo 1.10.17 k8s 1.21.12 kapp 0.48

jdef commented 2 years ago

maybe related to #5554

github-actions[bot] commented 3 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.