solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.04k stars 431 forks source link

helm upgrade fails if last-applied-configuration contains a resourceVersion #6732

Open jenshu opened 2 years ago

jenshu commented 2 years ago

Gloo Edge Version

1.11.x (latest stable)

Kubernetes Version

No response

Describe the bug

If a Gateway or Upstream custom resource contains a resourceVersion in its kubectl.kubernetes.io/last-applied-configuration annotation, the helm upgrade will fail or hang.

Steps to reproduce the bug

Not sure how often this occurs in normal operation, but if you save the yaml of a Gateway resource (including the metadata.resourceVersion field), then kubectl apply that yaml, it will cause the Gateway to have a last-applied-config annotation that contains a resourceVersion, e.g. something like

    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.solo.io/v1","kind":"Gateway","metadata":{"annotations":{},"creationTimestamp":"2022-07-14T11:32:43Z","generation":2,"labels":{"app":"gloo"},"name":"gateway-proxy","namespace":"gloo-system","resourceVersion":"16226","uid":"28cbb332-8092-4b0e-89d6-29ae812b2173"},"spec":{"bindAddress":"::","bindPort":8080,"httpGateway":{},"proxyNames":["gateway-proxy"],"useProxyProto":false},"status":{"statuses":{"gloo-system":{"reportedBy":"gateway","state":1}}}}

Then when you run helm upgrade, a job tries to kubectl apply the Gateway from the helm chart (which doesn't include resourceVersion or those other metadata fields), and will result in an error

The gateways "gateway-proxy" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

Expected Behavior

helm upgrade should succeed and be able to apply the gloo custom resources

Additional Context

Workaround: delete the last-applied-config annotation before attempting the helm upgrade, e.g. kubectl annotate gateway -n gloo-system gateway-proxy kubectl.kubernetes.io/last-applied-configuration-

jenshu commented 1 year ago

As this appears to be an uncommon issue caused by user manually editing the CRs, we do not plan to fix at this time. Instead we will add a note in the upgrade docs detailing how to get out of this situation if it occurs.

github-actions[bot] commented 1 month ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.