Strengthen Cherry Pick Guidance

jeremyrickard commented 10 months ago

Describe the issue

The current cherry pick guidance has fairly clear criteria on what kind of PRs are good for cherry picks, but it seems like we could maybe enhance our guidance, for release managers reviewing cherry picks, for contributors opening cherry picks, and for SIG Leads reviewing cherry picks.

At the wg-lts meeting on November 21st, @liggitt presented a pretty thorough analysis on regressions introduced into patch releases through cherry picks, which are available here:

Kubernetes patch release regression/bugfix rate

Analysis of Kubernetes regression rates, patterns, examples

There was a pretty important take away: Every single minor version has had a backport cause a regression.

In the wg-lts meeting, we discussed a few concrete things:

We should limit back ports to regression / security / data-loss fixes
Anything beyond that needs justification / careful risk analysis (size, entanglement with other changes, test coverage)
The level of scrutiny applied to changes to master get between code freeze and a .0 release is the level we should be applying to cherry picks as well

The first bullet point is basically already expressed in our existing guidelines, so perhaps we should investigate a new PR template for cherry picks that includes a self-attestation from someone opening a cherry pick? This could also include a section to better indicate when a bug was introduced to better help understand if a bug fix should actually be cherry picked back to a release if the bug already existed prior to the .0 of that minor release.

/sig release /kind documentation

sdodson commented 9 months ago

I spent some time looking to see if there were industry standard defect classes like data-loss, security, performance hoping maybe we could put together a bit of taxonomy and decide which classes are valid for cherry picks. However, it seems like most alignment is around severity like critical, major, minor, etc where the highest severity issues generally include security, data-loss, and often performance regressions. I think the recommendations from the first bullet point are a great place to start.

Assuming we want labels for these, area/security kind/regression are available today but data-loss doesn't seem to have a relevant label, perhaps kind/data-loss?

YuikoTakada commented 9 months ago

Assuming we want labels for these, area/security kind/regression are available today but data-loss doesn't seem to have a relevant label, perhaps kind/data-loss?

does it mean that unless labels( like area/security ) added, the PR is not allowed to cherry-pick?

sdodson commented 9 months ago

does it mean that unless labels( like area/security ) added, the PR is not allowed to cherry-pick?

I'm not sure it becomes a requirement, initially, but I believe having the labels available to track the criteria that are part of the consideration in cherry-pick approval would be helpful. As written the description just says that the author would self affirm it's one of the approved fix classes or provide other reasoning why the change needs to be backported so that the release team can make more informed decisions.

neolit123 commented 9 months ago

note that sometimes a fix for a user blocking bug could be eligible for a backport, but today that is just a kind/bug. at the EOD everything that is backported is a bug fix.

instead of working with the current set of labels we could create a new family of labels foo/bar, where bar must always be the resson and classification provided by the backport author and foo will be the family specific to backports. these labels could be explaned by bots, and blocking if not present (ALA needs-foo).

at the same time, ETOOMANYLABELS.

liggitt commented 9 months ago

one other category of changes that seem acceptable for backports came up - 1.29.0 accidentally enabled an alpha capability by not applying the feature gate correctly... fix to apply the gate correctly is in https://github.com/kubernetes/kubernetes/pull/122343

That's sort of the opposite of a regression... it's an accidental progression of functionality intended to be gated to roll out in a controlled way.

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

jeremyrickard commented 5 months ago

/remove-lifecycle rotten

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 week ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/community/issues/7634#issuecomment-2364027904): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / community

Strengthen Cherry Pick Guidance #7634

Describe the issue