solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 438 forks source link

Helm rollback from versions >= 1.11.20 to versions <= 1.11.19 does not end successfully. #8036

Open tiberiuac opened 1 year ago

tiberiuac commented 1 year ago

Gloo Edge Version

1.11.x

Kubernetes Version

1.24.x

Describe the bug

It seems it has something to do with some change from here.

Let's take for example the rollback from 1.11.55 to 1.10.45.

Initial steps.

  1. Prepare a K8S cluster. (I've used kind)
  2. Prepare helm charts for gloo v1.11.55 and v1.10.45.
  3. Modify default values.yaml with (changed type of service for gatewayProxy from LoadBalancer to NodePort):

" gatewayProxies: gatewayProxy: service: type: NodePort "

Steps to reproduce the bug

  1. Deploy the helm chart of gloo v1.10.45 in the cluster, in "test" release and "test" namespace, using the modified values.yaml file.
  2. Apply the gloo CRDs for v1.11.55, to prepare the upgrade to gloo v1.11.55.
  3. Run the upgrade to gloo v1.11.55.
  4. There are some recommended advices for doing rollback, described here (1st paragraph); I've applied them, but only for the gateways, as no ExtAuth Upstreams, and RateLimit Upstreams were created. So, for the two gateways that were built, I've added back manually:

" ... metadata: annotations: meta.helm.sh/release-name: test meta.helm.sh/release-namespace: test ... labels: app.kubernetes.io/managed-by: Helm ... "

  1. The helm rollback will fail with a validation error. So, the command: "helm -n test rollback test 1" will fail with this error: "Error: no Gateway with the name "gateway-proxy" found".

If the helm upgrade to a lower 1.10.x, such as 1.10.45 is run instead, it will be successful. (which is a workaround for this bug)

Expected Behavior

I was expecting a successful rollback by running helm rollback command.

There is no consistency between helm rollback command and helm upgrade to the original old version command.

Additional Context

No response

github-actions[bot] commented 3 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.