google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.08k stars 2.32k forks source link

Automated releases #2834

Open iwankgb opened 3 years ago

iwankgb commented 3 years ago

To increase frequency of releases and allow us to fix monitoring bugs without affecting Kubernetes we need an automated way of releasing code. A release must consist of:

CC: @bobbypage

dims commented 3 years ago

Could we explore a CI job that would update k8s (locally) and run a bunch of things to get some confidence that we don't totally break k8s?

iwankgb commented 3 years ago

@dims sure, can you suggest what "bunch of things" could be?

dims commented 3 years ago

let's look here? https://cs.k8s.io/?q=cadvisor&i=nope&files=%5Etest%2Fe2e.*&excludeFiles=&repos=kubernetes/kubernetes

iwankgb commented 3 years ago

I had a chat about cAdvisor affecting Kubernetes stability with @bobbypage some time ago and I suggested following approach:

With decent release automation:

Screenshot 2021-03-04 at 16 02 07

I don't like the idea of making cAdvisor builds dependent on Kubernetes test because of substantial amount of flakes that we face there.

iwankgb commented 3 years ago

Execuse my mad photoshop skillz.

bobbypage commented 3 years ago

Thanks @iwankgb for putting together sketch of the proposal. I agree, having more automated and easy way to release will definitely help streamline the process, especially for cherrypick changes to fix issues in existing branch.

I think there's a few things here, so worth to separate them:

  1. Generally making release more automated instead of current manual steps as defined in https://github.com/google/cadvisor/blob/master/docs/development/releasing.md
  2. Change of release cadence, i.e. change existing model of keeping cAdvisor release in sync with k8s
  3. aarch64 images / binaries

Overall, #1 and #3 above clearly will help, so no questions there :)

Regarding #2, as I understand the main change there will basically be changing existing release schedule of having a single release timed in sync with k8s release. Instead we'll have two "active" releases, one that will be used for k8s and one that can be used standalone so that we can cherrypick changes to appropriate version as needed. I think that makes sense, especially with more easy release automation as you mentioned, which should hopefully keeps things straightforward.

Regarding having automated k8s tests as @dims mentioned, I agree it would nice to have, just to have confidence that cAdvisor is not causing some obvious kubelet breakage... We currently do have the prow cAdvisor e2e test, perhaps something like the summary test (https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go) would be good candidate as most of those metrics originate from cAdvisor. I'm not clear on how we can easily hook up something to run that test though (say on PRs or releases) though..., any ideas?

iwankgb commented 3 years ago

Perhaps we could try to run these tests when we merge a PR to a branch that is used for a Kubernetes-focused release and as a part of any release process? I'm not sure how tricky it is to integrate the test @bobbypage mentioned into the pipeline. Do you think that we can:

A release pipeline is relatively straightforward but running K8s test is something I will have to dig into.

bobbypage commented 3 years ago

That sounds like a plan, having k8s node test would be great to have but is separate topic from automating release process.