openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 105 forks source link

OCPBUGS-42492: Make ocp-tuned-one-shot.service restart on-failure #1177

Closed jmencak closed 1 month ago

jmencak commented 1 month ago

There are known podman issues/races, which may cause "podman run" failures (e.g. OCPBUGS-42492). While the correct approach is to fix podman and its configuration, this does not prevent against unknown/future podman issues which can easily be resolved by retries.

Add restart on failure for ocp-tuned-one-shot.service and a 5s wait between retries.

Resolves: OCPBUGS-42492

openshift-ci-robot commented 1 month ago

@jmencak: This pull request references Jira Issue OCPBUGS-42492, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1177): >There are known podman issues/races, which may cause "podman run" failures (e.g. OCPBUGS-42492). While the correct approach is to fix podman and its configuration, this does not prevent against unknown/future podman issues which can easily be resolved by retries. > >Add restart on failure for ocp-tuned-one-shot.service and a 5s wait between retries. > >Resolves: OCPBUGS-42492 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [jmencak] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jmencak commented 1 month ago

/jira refresh

openshift-ci-robot commented 1 month ago

@jmencak: This pull request references Jira Issue OCPBUGS-42492, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (liqcui@redhat.com), skipping review request.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1177#issuecomment-2393880989): >/jira refresh > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 1 month ago

@jmencak: This pull request references Jira Issue OCPBUGS-42492, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (liqcui@redhat.com), skipping review request.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1177): >There are known podman [issues/races](https://github.com/containers/podman/pull/23524), which may cause "podman run" failures (e.g. OCPBUGS-42492). While the correct approach is to fix podman and its configuration, this does not prevent against unknown/future podman issues which can easily be resolved by retries. > >Add restart on failure for ocp-tuned-one-shot.service and a 5s wait between retries. > >Resolves: OCPBUGS-42492 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
MarSik commented 1 month ago

/lgtm

openshift-ci[bot] commented 1 month ago

@jmencak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 1 month ago

@jmencak: Jira Issue OCPBUGS-42492 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1177): >There are known podman [issues/races](https://github.com/containers/podman/pull/23524), which may cause "podman run" failures (e.g. OCPBUGS-42492). While the correct approach is to fix podman and its configuration, this does not prevent against unknown/future podman issues which can easily be resolved by retries. > >Add restart on failure for ocp-tuned-one-shot.service and a 5s wait between retries. > >Resolves: OCPBUGS-42492 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202410171441.p0.g3e141a9.assembly.stream.el9. All builds following this will include this PR.

jmencak commented 1 month ago

/cherry-pick release-4.17

openshift-cherrypick-robot commented 1 month ago

@jmencak: new pull request created: #1187

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1177#issuecomment-2422395751): >/cherry-pick release-4.17 > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.