kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.17k stars 6.48k forks source link

Discussion: Upgrade dependencies version policy #11644

Open tico88612 opened 3 weeks ago

tico88612 commented 3 weeks ago

I would like to discuss the future dependency upgrade strategy. In #11555, it was mentioned that some users were confused by the issue of dependency downgrades after an upgrade.

Kubespray’s version number is updated following K8s upgrades (e.g., K8s 1.28 corresponds to Kubespray 2.24, K8s 1.29 corresponds to Kubespray 2.25, K8s 1.30 corresponds to Kubespray 2.26, and so on). After the release, a release-2.xx branch will appear. If release-2.24 does not include K8s 1.29, the dependencies for release-2.24 should not exceed version 2.25.0.

I have two potential approaches to avoid the downgrade issue after upgrades:

  1. Avoid proactively upgrading the dependencies of the release-2.xx branch unless a critical issue requires an urgent fix (with minimal upgrade impact). Otherwise, do not upgrade the versions after the branch is cut from the release. The advantage of this approach is that it maintains the original upgrade path, which is more intuitive for users. Any minor issues can be pointed out in the release notes under “action required.”
  2. Proactively upgrade the dependencies of the release-2.xx branch, but ensure the upgrade path is clearly defined to prevent downgrades (e.g., 2.24.3 -> 2.25.1 rather than 2.24.3 -> 2.25.0 -> 2.25.1).

If you have any thoughts, feel free to discuss!

@yankay @mzaian @VannTen @ant31

VannTen commented 3 weeks ago

1 is simply not acceptable. Release branch should only receive patch versions upgrade (= bugfix) for components, but they should absolutely receive them.

IMO, the only policy we need is: all supported branch should receive patch version upgrade. The problem is we lack the tooling to do this automatically, so it comes down to time allocation. This is why I started to work on the download.py, and there is a big PR which I need to review which proposes such a change.

Once we have a scheduled jobs which creates a PR regularly with new components, this will become a non-issue. (And frankly, I'd rather spend time on this than drafting a policy).

tico88612 commented 3 weeks ago

I don't have any other opinion if you want to use the automation tool to update the content. (I'll assume you prefer option 2)

I'll be honest: This issue means there are concerns (otherwise, #11555 wouldn't have appeared, right?). Also, there haven't been many minor updates in the past (especially with 2.20 or 2.21, which were only one release), and it's very much like option 1.

At least #10681 will be open for about a year. If you've started working on download_hash.py, are there any plans or ideas for small tasks that other contributors can do to help automate the process faster?

I think the larger PR you mention should be #11557 (and have POC). I'll have a look at the content later. I'm not sure if that would conflict with what you said about download_hash.py.

VannTen commented 3 weeks ago

At least #10681 will be open for about a year. If you've started working on download_hash.py, are there any plans or ideas for small tasks that other contributors can do to help automate the process faster?

Yep, totally.

I started the work on download_hash.py on #11513.

There is a bunch of TODO in the scripts itself (basically, add support for the rest of the components) If we want to use that it needs to:

Once the two first items as done we can already think about automation.

I think the larger PR you mention should be #11557 (and have POC). I'll have a look at the content later. I'm not sure if that would conflict with what you said about download_hash.py.

Yep, that's the one. Does not need to conflict, but it's a bit hard to review because it's big ^.

I've only started to look at the python code in the PR, not the automation itself.

In any case, IMO the first step is to have an ubiquitous script. Adding automation on top is not completely trivial (handling labels, etc) but it's not the most complicated part.

VannTen commented 3 weeks ago

I'm hesitant to require that newer releases branch already have the PR, for two reasons:

tico88612 commented 3 weeks ago

Downgrading is not a bug, I agree, but some software is not always predictable about downgrading behavior. (Especially since etcd is a very important database for K8s, we need to be careful about upgrading behavior, rather than blindly upgrading to the latest version.)

Etcd 3.5 downgrade 3.4: https://github.com/etcd-io/etcd/issues/15878

Or if there is a foreseeable Containerd 2.0.0 release in the future (e.g., a major update), this could happen if it is applied to other existing maintenance release-2.xx branches (e.g., 2.25.3 applies containerd 2.0.0 but 2.26.0 applies containerd 1.7.22, which would result in an upgrade of K8s but a downgrade of the other packages). We are not sure if containerd 2.0.0 downgrade is expected.

VannTen commented 3 weeks ago

This is a different beast altogether. Major and minor versions upgrade, yes, we only do those in master, because it's to add features.

But we were talking about patch versions upgrade specifically.

tico88612 commented 3 weeks ago

Although I don’t think it’s a different issue, I believe it would be acceptable if there is a compatibility downgrade check (#11530) in the future.

VannTen commented 3 weeks ago

I'm not sure of the relevance of #11530 here, did you mean to link something else ?

By compatibility downgrade check, what do you mean exactly ? I think the upgrade tests already tests the 2.(X-1).Y -> 2.X.Z path on release branches (and master)