NO-JIRA: add a success case for pods that pass the three or more restarts

kannon92 commented 2 months ago

cc @stbenjam

openshift-ci-robot commented 2 months ago

@kannon92: This pull request explicitly references no jira issue.

In response to [this](https://github.com/openshift/origin/pull/29118): >cc @stbenjam Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Forigin). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.

stbenjam commented 2 months ago

Thanks

/lgtm

stbenjam commented 2 months ago

/lgtm

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kannon92, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/origin/blob/master/OWNERS)~~ [stbenjam] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

neisw commented 2 months ago

Good time to consider %d is considered a flake for now as well?

https://github.com/openshift/origin/pull/29108#issuecomment-2360961536

kannon92 commented 2 months ago

@neisw I don't agree with that one.

https://github.com/openshift/origin/pull/29108#issuecomment-2360987842

kannon92 commented 2 months ago

/hold

wait on comments for @neisw

neisw commented 2 months ago

I think @stbenjam main concern was the potential to rename later. But it seems atypical to reference 'flake' in the test name. Looking at sippy for 4.18 tests containing flake it looks like only [sig-architecture] platform pods in ns/%s that restart more than %d is considered a flake for now variety.

neisw commented 2 months ago

We have done things in the past like this where we flake at first then fail some / flake others later on. Just something to consider if it is still early enough to tweak.

neisw commented 2 months ago

Question about failures we see in presubmits

Is the following an expected failure when the pod is deleted?

namespace/openshift-image-registry node/ip-10-0-20-204.us-west-1.compute.internal pod/node-ca-v94n4 uid/cc459ba2-bb39-482e-891e-fc3ae2a23056 container/node-ca restarted 4 times at:
non-zero exit at 2024-09-19 01:56:13.670067226 +0000 UTC m=+919.297863990: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2024-09-19 02:10:18.40417271 +0000 UTC m=+1764.031969474: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2024-09-19 02:41:34.783221583 +0000 UTC m=+3640.411018367: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2024-09-19 03:07:16.927074536 +0000 UTC m=+5182.554871310: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running

Wondering if this might be a case we want to start with flakes and observe the failures first?

deads2k commented 2 months ago

Wondering if this might be a case we want to start with flakes and observe the failures first?

already did. This was merged a few weeks back and exceptions added based on ci.search.

kannon92 commented 2 months ago

/close

openshift-merge-robot commented 2 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

openshift-ci[bot] commented 2 months ago

@kannon92: Closed this PR.

In response to [this](https://github.com/openshift/origin/pull/29118#issuecomment-2361788848): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

openshift-ci[bot] commented 2 months ago

@kannon92: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-openstack-ovn	ee7b16e3255b2b864ffa6f294dc5b0d354301434	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-aws-ovn-single-node-upgrade	ee7b16e3255b2b864ffa6f294dc5b0d354301434	link	false	`/test e2e-aws-ovn-single-node-upgrade`
ci/prow/e2e-aws-ovn-kube-apiserver-rollout	ee7b16e3255b2b864ffa6f294dc5b0d354301434	link	false	`/test e2e-aws-ovn-kube-apiserver-rollout`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).

openshift / origin

NO-JIRA: add a success case for pods that pass the three or more restarts #29118