openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
103 stars 105 forks source link

OCPBUGS-14026: release-4.14: check agetty process moved under cpuset cgroup #1072

Closed SargunNarula closed 5 months ago

SargunNarula commented 6 months ago

Check if agetty process is moved under the cpuset cgroup.

openshift-ci-robot commented 6 months ago

@SargunNarula: This pull request references Jira Issue OCPBUGS-14026, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1072): > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 6 months ago

@SargunNarula: This pull request references Jira Issue OCPBUGS-14026, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1072): >Check if agetty process is moved under the cpuset cgroup. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
SargunNarula commented 6 months ago

/retest

SargunNarula commented 6 months ago

/retest

shajmakh commented 6 months ago

Thanks for this! /lgtm /hold leaving room for others review

openshift-ci[bot] commented 6 months ago

@SargunNarula: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift 9f2decb0f643ba77fccefa1aaa207a2967b52d09 link true /test e2e-hypershift

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
ffromani commented 6 months ago

/approve

MarSik commented 6 months ago

Honestly I am not sure the test design is sound. Yes, we move agetty under system.slice, but that is an implementation detail. The important part we should be testing for is that no system process is interfering with the isolated cpus.

The bug was agetty not being moved properly, fine, but next time it might be a totally different process and this test will not catch it.

openshift-ci[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, SargunNarula

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/release-4.14/OWNERS)~~ [ffromani] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
MarSik commented 6 months ago

/hold

MarSik commented 6 months ago

One other thing. Why is this PR trying to merge into 4.14? We always start with the master branch. Is this a backport?

SargunNarula commented 5 months ago

After an offline discussion with @MarSik, we found a test case that identifies all systemd processes and checks if they are running in the system.slice cgroup. We need similar functionality here, as required by this bug: https://github.com/openshift/cluster-node-tuning-operator/blob/master/test/e2e/performanceprofile/functests/1_performance/performance.go#L305.

This test was introduced to cover all scenarios in this PR: https://github.com/openshift/cluster-node-tuning-operator/pull/992. The PR was back-ported to versions 4.14 and 4.15, so no additional fixes are needed here.

The agetty bug was opened for versions 4.13 and 4.14, but we do not plan to cover this test in either version.

openshift-ci-robot commented 5 months ago

@SargunNarula: This pull request references Jira Issue OCPBUGS-14026. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1072): >Check if agetty process is moved under the cpuset cgroup. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.