openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 104 forks source link

OCPBUGS-37956: E2E: Add test to verify cpuset.cpus.exclusive is writeable #1127

Closed mrniranjan closed 2 months ago

mrniranjan commented 3 months ago

Automates OCPBUGS-34812: cgroupsv2: failed to write on cpuset.cpus.exclusive

To reproduce the bug, we need to create and delete deployment (deploying guaranteed pods with cpu load balancing annotation) in quick succession so that we do not fully wait for the cleanup causing the pod about to be deleted to still have access to exclusive cpus causing the new pod from to fail because we can't still write cpuset.cpus.exclusive . As the pre-start hook fails to write to cpuset.cpus.exclusive file in the pods' cgroup.

This automation PR creates and deletes deployment in loop to reproduce the issue and checks if the pods fails with Runtime error with message "failed to run pre-start hook for container"

openshift-ci-robot commented 3 months ago

@mrniranjan: This pull request references Jira Issue OCPBUGS-37956, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @mrniranjan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127): >Automates OCPBUGS-34812: cgroupsv2: failed to write on cpuset.cpus.exclusive Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 3 months ago

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: mrniranjan.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127#issuecomment-2268331201): >@mrniranjan: This pull request references [Jira Issue OCPBUGS-37956](https://issues.redhat.com//browse/OCPBUGS-37956), which is valid. > >
3 validation(s) were run on this bug > >* bug is open, matching expected state (open) >* bug target version (4.17.0) matches configured target version for branch (4.17.0) >* bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @mrniranjan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127): >Automates OCPBUGS-34812: cgroupsv2: failed to write on cpuset.cpus.exclusive Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mrniranjan commented 3 months ago

/test e2e-hypershift

ffromani commented 2 months ago

/approve

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, mrniranjan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [ffromani] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 2 months ago

@mrniranjan: This pull request references Jira Issue OCPBUGS-37956, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127): >Automates OCPBUGS-34812: cgroupsv2: failed to write on cpuset.cpus.exclusive > >To reproduce the bug, we need to create and delete deployment (deploying guaranteed pods with cpu load balancing annotation) in quick succession so that we do not fully wait for the cleanup causing the pod about to be deleted to still have access to exclusive cpus causing the new pod from to fail because we can't still write cpuset.cpus.exclusive . As the pre-start hook fails to write to cpuset.cpus.exclusive file in the pods' cgroup. > >This automation PR creates and deletes deployment in loop to reproduce the issue and checks if the pods fails with Runtime error with message "failed to run pre-start hook for container" Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
mrniranjan commented 2 months ago

/jira refresh

openshift-ci-robot commented 2 months ago

@mrniranjan: This pull request references Jira Issue OCPBUGS-37956, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @mrniranjan

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127#issuecomment-2306415990): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 2 months ago

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: mrniranjan.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127#issuecomment-2306416072): >@mrniranjan: This pull request references [Jira Issue OCPBUGS-37956](https://issues.redhat.com//browse/OCPBUGS-37956), which is valid. > >
3 validation(s) were run on this bug > >* bug is open, matching expected state (open) >* bug target version (4.18.0) matches configured target version for branch (4.18.0) >* bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @mrniranjan

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127#issuecomment-2306415990): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mrniranjan commented 2 months ago

/test e2e-gcp-pao-updating-profile

shajmakh commented 2 months ago

Thanks for all the updates! /lgtm /hold for other reviewers lgtm. Please feel free to unhold when you get lgtm from other reviewers

mrniranjan commented 2 months ago

/test e2e-upgrade

shajmakh commented 2 months ago

/lgtm Thanks for your efforts on this! feel free to unhold as you see fit

mrniranjan commented 2 months ago

/test e2e-hypershift

openshift-ci[bot] commented 2 months ago

@mrniranjan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
mrniranjan commented 2 months ago

/hold cancel

openshift-ci-robot commented 2 months ago

@mrniranjan: Jira Issue OCPBUGS-37956: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-37956 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127): >Automates OCPBUGS-34812: cgroupsv2: failed to write on cpuset.cpus.exclusive > >To reproduce the bug, we need to create and delete deployment (deploying guaranteed pods with cpu load balancing annotation) in quick succession so that we do not fully wait for the cleanup causing the pod about to be deleted to still have access to exclusive cpus causing the new pod from to fail because we can't still write cpuset.cpus.exclusive . As the pre-start hook fails to write to cpuset.cpus.exclusive file in the pods' cgroup. > >This automation PR creates and deletes deployment in loop to reproduce the issue and checks if the pods fails with Runtime error with message "failed to run pre-start hook for container" Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 2 months ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202408272341.p0.g151f6d2.assembly.stream.el9. All builds following this will include this PR.

mrniranjan commented 2 months ago

/cherry-pick release-4.17

openshift-cherrypick-robot commented 2 months ago

@mrniranjan: new pull request created: #1146

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1127#issuecomment-2315360508): >/cherry-pick release-4.17 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.