kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.54k stars 1.3k forks source link

Clearly define the CI team's scope #9849

Closed cahillsf closed 1 month ago

cahillsf commented 10 months ago

Related discussion: https://github.com/kubernetes-sigs/cluster-api/issues/9735

Carryover from: https://github.com/kubernetes-sigs/cluster-api/issues/9104


  1. Clarify our team documentation for the CI team to:
    • remove any misleading tasks that currently are documented under CI team scope (i.e. general "bug triage")
      • remove automation tooling from CI team name (this should be documented as falling under the general responsibility of whole release team and assigned to the specific teams as applicable)
    • indicate that desired state of CI team responsibility for CI related flakes/bugs is not only to identify the issue, but find root cause and introduce a fix. This also means that maintainers and other invested community members should be available to help/assist/educate the CI team when they need an escalation path. This effectively encapsulates the concept of CI reliability (as mentioned here by Fabrizio) into the existing CI team's responsibility
  2. Take steps necessary to bring the ideal state of having CI team empowered/capable/responsible for complete resolution of CI issues as quickly as possible closer to reality. Some ideas:
    • increase the size of the CI team for release-1.8 cycle
    • define what the CI team's "escalation path" should look like when they have reached the limits of their understanding/troubleshooting paths
cahillsf commented 10 months ago

@nawazkh i see your name on this one from last release, would you like to assign this issue to yourself?

cahillsf commented 10 months ago

/area release

sbueringer commented 10 months ago

/triage accepted

fabriziopandini commented 9 months ago

can we dedup closing on of this or https://github.com/kubernetes-sigs/cluster-api/issues/9735 (probably this since the entire discussion is on the other issue)?

cahillsf commented 6 months ago

reopening as we do have some work for the docs that came out of this: https://github.com/kubernetes-sigs/cluster-api/issues/9735#issuecomment-1969592360

fabriziopandini commented 6 months ago

/kind documentation /priority important-soon

k8s-triage-robot commented 2 months ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

sbueringer commented 1 month ago

Is this still something we want to work on?

cahillsf commented 1 month ago

i think it would be helpful, looks like it was added to improvement tasks for last cycle but no one picked it up.

for part 2, i did "overweigh" the CI team for release-1.8 and release-1.9 CI team also appears to be overweight compared to the other teams. can update the release docs to prefer this approach going forward


happy to grab this and open a PR, i think the only piece that is hazy is:

define what the CI team's "escalation path" should look like when they have reached the limits of their understanding/troubleshooting paths

this ambiguity is apparent in the CI team docs too https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/ci-signal#continuously-bug-triage:

We probably have to figure out some details about the overlap between the bug triage task here, release leads and Cluster API maintainers.

not sure if you have thoughts for a more "formal" approach here @sbueringer?

i can just make this reflective of what our current "process" here is. something along the lines of "post in #cluster-api slack channel to increase visibility. continue discussion in thread there. bring up in weekly meeting and/or spin off a dedicated zoom chat for a focused session if helpful..."

/assign cahillsf

sbueringer commented 1 month ago

I don't know if we want or need a more "formal" approach. Documenting the current state sounds reasonable