Open tpepper opened 5 years ago
+1000
I'm now more sad I couldn't make the meeting today
On Tue, Apr 23, 2019, 17:32 Tim Pepper notifications@github.com wrote:
There is an ongoing need for better managing external dependencies.
The release team regularly scrambles to collect the current preferred dependency versions. These are inconsistently articulated in multiple files across multiple repos in non-machine-readable ways. And some are even untracked outside of anecdotal lore.
Various prior issues have been opened, for example #400 https://github.com/kubernetes/sig-release/issues/400 and this regularly comes up in release retrospectives.
SIG Release needs to draft a KEP for implementation by the release team to outline the problem space, possible solutions. We need a machine readable, structured, single source of truth. It should have a broad OWNERS set to get wide review on changes and not be blocked on a small set of reviewers. Code in the project that needs to get “etcd” should get the version specified in this file. Release notes should draw from this file and its changelog. A PR changing a dependency in this file might get a special label, insured release notes inclusion, and special review. Special review can be needed to insure one group doesn't upgrade for a fix, introduce a regression in some other code, those owners revert the upgrade, re-introducing the prior bug (this has actually happened multiple times).
One potential problem with this approach, which has been a past blocker, is that this could mean work in a sub-project repo requires checking out some other repo in order to get this hypothetical yaml saying what are the preferred versions.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/sig-release/issues/601, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKUO6C4OAKAS6XTXQ4ZUS3PR557BANCNFSM4HH6SQZA .
/area release-eng /priority important-soon
Circling back to this item - I expressed interest in helping out here - I mostly care for the purposes of 1.15 for defining what is the list of dependencies that we need to care about
I can start a draft of a KEP for what are those dependencies
/assign
/cc
Initial PR for discussion here: https://github.com/kubernetes/kubernetes/pull/79366
Notice sent to k-dev, @kubernetes/sig-release, @kubernetes/release-team, and @kubernetes/release-engineering regarding the merged changes in https://github.com/kubernetes/kubernetes/pull/79366: https://groups.google.com/d/topic/kubernetes-dev/cTaYyb1a18I/discussion
also error message improvements here: kubernetes/kubernetes#80060
build/external: Move dependencies.yaml and update OWNERS - https://github.com/kubernetes/kubernetes/pull/80799
I propose we un-milestone 1.16 this umbrella issue and remove the area release team, assuming that the release notes team for 1.16 (@saschagrunert @onyiny-ang @cartyc @kcmartin @paulbouwer ) have the dependencies.yaml file documented and codified as the source of info for the dependencies section https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#dependencies
It may also be about time to go ahead and close this as complete for the first go and move from the giant long lived umbrella issue to smaller point issues (like is happening already above) for incremental improvement.
Thanks for the hint, I assume that we still update the release notes dependency section manually for 1.16. :)
That may be a bit out of scope, but I wrote a tool some time ago to diff go modules between git releases automatically: https://github.com/saschagrunert/go-modiff
/remove-area area/release-team
@lachie83: Those labels are not set on the issue: area/area/release-team
/remove-area release-team
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen /unassign @claurence
@Pluies reached out to me before the holidays with this:
Hi Stephen!
Florent here, I wanted to get in touch to thank you for sharing this – I've been thinking about the "pinned infra dependencies" problem for a long time, and really enjoyed reading about the way Kubernetes deal with this!
I've used it as an inspiration to write Zeitgeist, a language-agnostic dependency checker: https://github.com/Pluies/zeitgeist
It includes the "dependencies declaration in yaml" and "checking all occurrences of dependencies are in sync" feature of verifydependencies.go, and extends it with a way to check if the current version is up-to-date with its upstream (which could be releases in a Github repo, a Helm chart...). Upstreams are based on a plugin system, so more types of upstreams can be added as desired.
Let me know if this is something that could be of interest for the k8s project, I'd be happy to help with the integration (which should be pretty much drop-in). :)
Cheers, Florent Delannoy
What do we think about using zeitgeist?
cc: @dims @cblecker @BenTheElder @liggitt
Other than vendor/
I don't think the original post makes much sense with code like kuebadm going out of tree.
etcd is not specified in tree by anything other than cluster provisioning tooling, which we have issues open about removing from the tree.
vendor/ already has an established dependency review system, and I don't think it needs any new tooling.
what other dependencies are we talking about?
I would also note that in order to maintain a tool that brings up clusters you pretty much need the freedom to update dependencies at will. We do not force all cluster tools to synchronize on some specific version of e.g. containerd today, and I would not be in favor of doing so in the future.
I agree with @BenTheElder that users (and vendors) need the ability to override project preferred defaults, if that's what was stated ;) My primary point is we need a stronger definition of "project preferred defaults". We do have these sprinkled around the code. We do bring up clusters, intentionally with certain components and component versions, and run tests with intention of proving specific combinations. We observe and fix real bugs relative to specific external non-golang dependency name/version/release tuples.
At some point we, the collective us as a community, need to understand what we're engineering, coding and testing against, and giving "support". IMO we should do that more strongly. (Also am open to conversation around if we could also not do that and assume vendors will manage that in a sufficiently coherent way, or expect that the dependencies don't have incompatible skews.)
From the older example linked above, there was a time where we tried to track more: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#dependencies
Since then the list of non-go-modules dependencies which are tracked is down to golang and etcd: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#dependencies If I were to read the difference between that 1.15 and 1.17 list, might I infer that Kubernetes 1.17 and higher now run fine with any cri-tools, cluster autoscaler, cadvisor, CNI, CSI, klog, etc. I'd love for the ecosystem of projects to be stable enough that we don't need to actively track in detail. Yet patches to some of those dependencies' version-in-use are frequently proposed for cherry-pick on release branches, which I take as evidence we do seem to track.
Another point that changelog shows is that we don't have a canonical source of truth. The user-focused message there is coming from the long series of commits and gives the sum of those (in arbitrary order?):
Update etcd client side to v3.4.3 (#83987, @wenjiaswe)
Kubernetes now requires go1.13.4+ to build (#82809, @liggitt)
Update to use go1.12.12 (#84064, @cblecker)
Update to go 1.12.10 (#83139, @cblecker)
Update default etcd server version to 3.4.3 (#84329, @jingyih)
Also interesting to me: K3s proves one may not even need etcd at all (or via a small patch anyway, and with a set of caveats on runtime robustness).
The kubeadm departure from k/k is an interesting case. Since they intend to branch and version with k/k, are they implicitly following any k/k's implicit dependencies? They (and other installers) do actively track some dependencies.
I'm open to arguments that it's not on us to manage. As the monolith splits and toward a more loosely coupled future, can we argue there is no longer a need for common base expectations? To me the split feels like it makes worse the potential for unmanaged risk of implicit dependencies and end-user confusion.
The kubeadm departure from k/k is an interesting case. Since they intend to branch and version with k/k, are they implicitly following any k/k's implicit dependencies?
yes, unless something is broken.
They (and other installers) do actively track some dependencies.
most installers usually trail behind.
+1 @neolit123
@tpepper :
My primary point is we need a stronger definition of "project preferred defaults". We do have these sprinkled around the code. We do bring up clusters, intentionally with certain components and component versions, and run tests with intention of proving specific combinations. We observe and fix real bugs relative to specific external non-golang dependency name/version/release tuples.
IMO project preferred defaults is a problematic topic for political rather than technical reasons.
Are we going to start advertising preferred CRI and CNI ...?
At some point we, the collective us as a community, need to understand what we're engineering, coding and testing against, and giving "support". IMO we should do that more strongly.
We don't provide support for external tools. Doing so is perhaps not the best idea.
Complete solutions like kops, minikube, kind etc. do package some external tools necessarily and provide their own support there, but for kubernetes to do so seems like a mis-step unless we're prepared to pick a favorite for each option...
If I were to read the difference between that 1.15 and 1.17 list, might I infer that Kubernetes 1.17 and higher now run fine with any cri-tools, cluster autoscaler, cadvisor, CNI, CSI, klog, etc. I'd love for the ecosystem of projects to be stable enough that we don't need to actively track in detail. Yet patches to some of those dependencies' version-in-use are frequently proposed for cherry-pick on release branches, which I take as evidence we do seem to track.
Cluster autoscaler should advertise it's own compatibility with kubernetes and not vice versa, as should CNI implementations and CSI implementations etc. klog is ??? not an issue??
Also interesting to me: K3s proves one may not even need etcd at all (or via a small patch anyway, and with a set of caveats on runtime robustness).
Zero patches away, you can "simply" implement the etcd wire protocol but there are some problems there that I'd rather discuss in another forum :+)
IMO project preferred defaults is a problematic topic for political rather than technical reasons.
Are we going to start advertising preferred CRI and CNI ...?
In as much as there are classes of interfaces or providers, as an open source project with limited resources I feel like we have a few paths:
Seriously though the latter is obviously highly unlikely to happen. The middle is where we are now. The first is simpler but for the choice of which.
At some point we, the collective us as a community, need to understand what we're engineering, coding and testing against, and giving "support". IMO we should do that more strongly.
We don't provide support for external tools. Doing so is perhaps not the best idea.
We don't support external tools, but we debug problem reports. We support our code running in conjunction with external components both in CI and we welcome end-users' problem reports. That requires our finite resources have an understanding of and ability to debug a not-very-finite set of runtime combinations. Can we actively manage that complexity or must it be a free for all?
I feel like if we declare the things we run in test, do that in common (across the org?), reduce the size of the test matrix, then we can have more realistic conversations about what is containable beyond a simple, common short list of variations and at what cost. We feel out of balance and unsustainable where we are today.
Relates IMO partly to conversation in https://github.com/kubernetes/test-infra/issues/18551 and https://github.com/kubernetes/sig-release/issues/966 around establishing more clean test plan.
/unassign /help
@justaugustus: This request has been marked as needing help from a contributor.
Please ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
There is an ongoing need for better managing external dependencies.
The release team regularly scrambles to collect the current preferred dependency versions. These are inconsistently articulated in multiple files across multiple repos in non-machine-readable ways. And some are even untracked outside of anecdotal lore.
Various prior issues have been opened, for example https://github.com/kubernetes/sig-release/issues/400 and this regularly comes up in release retrospectives.
SIG Release needs to draft a KEP for implementation by the release team to outline the problem space, possible solutions. We need a machine readable, structured, single source of truth. It should have a broad OWNERS set to get wide review on changes and not be blocked on a small set of reviewers. Code in the project that needs to get “etcd” should get the version specified in this file. Release notes should draw from this file and its changelog. A PR changing a dependency in this file might get a special label, insured release notes inclusion, and special review. Special review can be needed to insure one group doesn't upgrade for a fix, introduce a regression in some other code, those owners revert the upgrade, re-introducing the prior bug (this has actually happened multiple times).
One potential problem with this approach, which has been a past blocker, is that this could mean work in a sub-project repo requires checking out some other repo in order to get this hypothetical yaml saying what are the preferred versions.