openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 104 forks source link

OCPBUGS-37754: Remove tuned/rendered object #1133

Closed jmencak closed 2 months ago

jmencak commented 2 months ago

This is a backport #1110 of which resolved OCPBUGS-36870 in 4.15.

The NTO operand is controlled by the operator by updates to two resources. Its corresponding k8s Tuned Profile resource and tuned/rendered object, which contains a list of all TuneD (daemon) profiles.

While this setup works for most cases, there is a problem with this approach when a cluster administator changes both a current TuneD profile content and (at the same) time switches to a new TuneD profile completely. Then, depending on the k8s object update timing, we could see two TuneD daemon reloads instead of just one.

Remove the tuned/rendered object and carry TuneD (daemon) profiles directly in the Tuned Profile k8s objects.

Resolves: OCPBUGS-37754

openshift-ci-robot commented 2 months ago

@jmencak: This pull request references Jira Issue OCPBUGS-37754, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.14.z) matches configured target version for branch (4.14.z) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST) * release note text is set and does not match the template * dependent bug [Jira Issue OCPBUGS-36870](https://issues.redhat.com//browse/OCPBUGS-36870) is in the state Closed (Done-Errata), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA)) * dependent [Jira Issue OCPBUGS-36870](https://issues.redhat.com//browse/OCPBUGS-36870) targets the "4.15.z" version, which is one of the valid target versions: 4.15.0, 4.15.z * bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Jira (liqcui@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1133): >This is a backport #1110 of which resolved OCPBUGS-36870 in 4.15. > >The NTO operand is controlled by the operator by updates to two resources. Its corresponding k8s Tuned Profile resource and tuned/rendered object, which contains a list of all TuneD (daemon) profiles. > >While this setup works for most cases, there is a problem with this approach when a cluster administator changes both a current TuneD profile content and (at the same) time switches to a new TuneD profile completely. Then, depending on the k8s object update timing, we could see two TuneD daemon reloads instead of just one. > >Remove the tuned/rendered object and carry TuneD (daemon) profiles directly in the Tuned Profile k8s objects. > >Resolves: OCPBUGS-37754 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jmencak commented 2 months ago

Keeping WiP for manual testing.

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/release-4.14/OWNERS)~~ [jmencak] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jmencak commented 2 months ago

/retest

jmencak commented 2 months ago

Manual testing went fine. Also, successfully ran ~350 iterations of the e2e-aws-operator e2e test. The e2e upgrade tests seems flaky to me. /retest

openshift-ci[bot] commented 2 months ago

@jmencak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
Tal-or commented 2 months ago

/lgtm Trivial backport, CI is happy

MarSik commented 2 months ago

/label backport-risk-assessed

jmencak commented 2 months ago

@liqcui , can we please have cherry-pick-approved label? Thank you!

liqcui commented 2 months ago

/label cherry-pick-approved

openshift-ci-robot commented 2 months ago

@jmencak: Jira Issue OCPBUGS-37754: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-37754 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1133): >This is a backport #1110 of which resolved OCPBUGS-36870 in 4.15. > >The NTO operand is controlled by the operator by updates to two resources. Its corresponding k8s Tuned Profile resource and tuned/rendered object, which contains a list of all TuneD (daemon) profiles. > >While this setup works for most cases, there is a problem with this approach when a cluster administator changes both a current TuneD profile content and (at the same) time switches to a new TuneD profile completely. Then, depending on the k8s object update timing, we could see two TuneD daemon reloads instead of just one. > >Remove the tuned/rendered object and carry TuneD (daemon) profiles directly in the Tuned Profile k8s objects. > >Resolves: OCPBUGS-37754 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 2 months ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator This PR has been included in build cluster-node-tuning-operator-container-v4.14.0-202408191041.p0.g10ac2c4.assembly.stream.el9. All builds following this will include this PR.

openshift-merge-robot commented 2 months ago

Fix included in accepted release 4.14.0-0.nightly-2024-08-20-170445