solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 437 forks source link

CRD versioning (API versioning) #5663

Open antonioberben opened 2 years ago

antonioberben commented 2 years ago

Many times, when you have to downgrade gloo, a user has to remove the CRDs (API contract) so that the downgraded new release will have the correct matching CRDs.

The approach was to delete the CRDs and then, to install the new release.

Now, with canary deployments we will have 2 versions in the system, but only one CRD. Here some braking changes in the API:

image

For reusable resources like VS, RT, AuthConfig, etc. How having two different versions deployed can affect the system?

This issue/question is related to: https://github.com/solo-io/gloo/issues/5499 https://github.com/solo-io/gloo/issues/5466

sam-heilbron commented 2 years ago

This is an interesting case, and I think there a couple of aspects to it:

  1. Successful downgrades
  2. Canary deployments
  3. True API versioning

I think we can solve 1 and 2, using a single version of the API, as long as the API respects the following criteria:

Assuming that the Gloo Edge API abides by these criteria (work we can do is codify these rules, add testing to them, and introduce canary upgrade/downgrade testing suites), I think we're actually able to perform downgrades for a single version Gloo API (ie gloo 1.11.1 using v1 api, and gloo 1.12.1 using v1 api).

For 3, we'd need to introduce a mutation webhook to convert between API versions. This would be more involved.

jenshu commented 2 years ago

related issue https://github.com/solo-io/solo-projects/issues/4196

info on CRD versioning: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/ we might want to come up with a deprecation policy similar to https://kubernetes.io/docs/reference/using-api/deprecation-policy/

ianmacclancy commented 2 years ago

This blog has a good summary of versioning that covers a lot of what Sam has said and also includes information from the CRD versioning links that Jenny has added.

It looks like we might be able to solve this through versioning with Kubernetes choosing the correct version of the API based on the version names we use.

As Sam says - this looks to be a multi-prong issue. I think a combination of versioning, use of correct depreciation and strict API change rules.

Do we have a good idea why customers are downgrading often enough for us to consider long-term solutions?

Adding on to upgrade/downgrade testing

github-actions[bot] commented 4 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.