kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.83k stars 2.65k forks source link

Kubernetes CI Policy: remove egregiously perma-failing jobs #18600

Open spiffxp opened 4 years ago

spiffxp commented 4 years ago

Part of https://github.com/kubernetes/test-infra/issues/18551

Why this is important:

http://storage.googleapis.com/k8s-metrics/failures-latest.json provides a list of jobs that have been failing continuously based on results stored in GCS. Note that not everything stored in GCS comes from prow.k8s.io; we allow for federated test results via https://github.com/kubernetes/test-infra/blob/master/kettle/buckets.yaml

Good candidates for removal include:

Make sure to include either @spiffxp or @BenTheElder on PRs for these. Not all of these are clear cut removals and we may want to make efforts to find a job owner or otherwise find a way to mitigate.

We should close this issue once we decide what a formal definition of "egregious" is, and verify that we've handled everything that meets it. We should then feed whatever we've learned here into a policy of maintaining job health going forward (which is basically the end goal of https://github.com/kubernetes/test-infra/issues/18599 as well)

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

BenTheElder commented 3 years ago

/remove-lifecycle rotten

spiffxp commented 3 years ago

We still have egregiously perma-failing jobs. For example, the top 3 from http://storage.googleapis.com/k8s-metrics/failures-latest.json

  "ci-kubernetes-node-kubelet-serial": {
    "failing_days": 1098
  },
  "ci-kubernetes-e2enode-ubuntu2-k8sstable3-gkespec": {
    "failing_days": 1021
  },
  "ci-kubernetes-e2e-gci-gce-statefulset": {
    "failing_days": 969
  },
spiffxp commented 3 years ago

https://github.com/kubernetes/test-infra/pull/21141 removed one

Need to refresh where we're at here.

liggitt commented 3 years ago

Jobs that fail 100% of Up or Test are good candidates - https://storage.googleapis.com/k8s-gubernator/triage/index.html?test=%5E(Up%7CTest)%24

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

k8s-triage-robot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

k8s-ci-robot commented 3 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/18600#issuecomment-892079291): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
dims commented 3 years ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 3 years ago

@dims: Reopened this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/18600#issuecomment-892082782): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
spiffxp commented 3 years ago

/milestone v1.23

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

BenTheElder commented 2 years ago

/remove-lifecycle stale /lifecycle frozen These jobs aren't going anywhere and this has to be dealt with someday

dims commented 2 years ago

xref: https://github.com/kubernetes/kubernetes/issues/109521

dims commented 2 years ago

/assign