openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.44k stars 4.69k forks source link

OCPBUGS-32477: Also rely on oomkilled exit code 137 in build test #28725

Closed ardaguclu closed 3 weeks ago

ardaguclu commented 3 weeks ago

In 4.15, when pod is killed due to insufficient memory, it raises an Error status rather than expected OOMKilled. This seems to be a bug in upstream but in order to unblock the presubmit jobs, we need to have a stop-gap solution. This PR also checks the Error status with the 137 exit code which corresponds to the OOMKilled.

Additionally, mariadb:10.3 version seems to be problematic in 4.15, so that this PR drops mariadb:10.3

ardaguclu commented 3 weeks ago

/retest

ardaguclu commented 3 weeks ago

/jira refresh

ardaguclu commented 3 weeks ago

/jira refresh

openshift-ci-robot commented 3 weeks ago

@ardaguclu: This pull request references Jira Issue OCPBUGS-32477, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.16.0) matches configured target version for branch (4.16.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/origin/pull/28725): >In 4.15, when pod is killed due to insufficient memory, it raises an `Error` status rather than expected `OOMKilled`. This seems to be a bug in upstream but in order to unblock the presubmit jobs, we need to have a stop-gap solution. This PR also checks the `Error` status with the 137 exit code which corresponds to the OOMKilled. > >Additionally, mariadb:10.3 version seems to be problematic in 4.15, so that this PR drops mariadb:10.3 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Forigin). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
ardaguclu commented 3 weeks ago

/retest-required

ardaguclu commented 3 weeks ago

/retest

openshift-ci[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, ardaguclu, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/origin/blob/master/OWNERS)~~ [soltysh] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ardaguclu commented 3 weeks ago

/retest-required

adambkaplan commented 3 weeks ago

Bug to update cri-o in 4.15 (it appears to have indeed been fixed for 4.16): https://issues.redhat.com/browse/OCPBUGS-32498

ardaguclu commented 3 weeks ago

/retest-required

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD e06689a1ea032ed0dc0cf0b82ca209a3487e8271 and 2 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD c3ef2674398b2f1f979b574c8b66a7c3b31f8155 and 1 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 4f37255ce988379234004c5f1600315251245fd0 and 0 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-ci-robot commented 3 weeks ago

/hold

Revision 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 was retested 3 times: holding

ardaguclu commented 3 weeks ago

/retest

ardaguclu commented 3 weeks ago

/hold cancel

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD b2195bf2d21c648e86209ec9e38942e6e4169672 and 2 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

ardaguclu commented 3 weeks ago

/test e2e-metal-ipi-ovn-ipv6

ardaguclu commented 3 weeks ago

/retest-required

ardaguclu commented 3 weeks ago

/test e2e-metal-ipi-ovn-ipv6

ardaguclu commented 3 weeks ago

/test e2e-metal-ipi-ovn-ipv6

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD ce9d443c8598ca2e9c2cbb03e663b93ed830058f and 1 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD cab2b73f7550e518d6708b39fd4ed671f65e4126 and 0 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-ci[bot] commented 3 weeks ago

@ardaguclu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-serial 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-single-node 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-single-node-upgrade 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 link false /test e2e-aws-ovn-single-node-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 3 weeks ago

/hold

Revision 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 was retested 3 times: holding

ardaguclu commented 3 weeks ago

/retest-required

ardaguclu commented 3 weeks ago

/hold cancel

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 738fd8c787456cc79b31e4bd271e1f0509c986b1 and 2 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-trt-bot commented 3 weeks ago

Job Failure Risk Analysis for sha: 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial Low
[sig-arch] events should not repeat pathologically for ns/openshift-etcd-operator
This test has passed 53.33% of 60 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node-serial'] in the last 14 days.
openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD d8b06ca133e1ca84fb8a3e75c08a48712be4ac3b and 1 for PR HEAD 9b850eb7ef43f08f3ee2a28b055f9f43c80015a8 in total

openshift-ci-robot commented 3 weeks ago

@ardaguclu: Jira Issue OCPBUGS-32477 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

In response to [this](https://github.com/openshift/origin/pull/28725): >In 4.15, when pod is killed due to insufficient memory, it raises an `Error` status rather than expected `OOMKilled`. This seems to be a bug in upstream but in order to unblock the presubmit jobs, we need to have a stop-gap solution. This PR also checks the `Error` status with the 137 exit code which corresponds to the OOMKilled. > >Additionally, mariadb:10.3 version seems to be problematic in 4.15, so that this PR drops mariadb:10.3 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Forigin). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.