openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
99 stars 104 forks source link

OCPBUGS-28647: tuned: distinguish deferred updates #1129

Closed ffromani closed 1 month ago

ffromani commented 1 month ago

To fully support the usecase described in OCPBUGS-28647 and fix the issue, we need to further distinguish between first-time profile change and (in-place) profile change. This is required to better support a GitOps flow.

The key distinction is if the recommended profile changes or not, and there's a desire to defer application of changes only when a profile is updated (e.g. sysctl modified), not the first time it is applied.

Thus:

We change the way the annotation is used. We now require a value, which can be either

openshift-ci-robot commented 1 month ago

@ffromani: This pull request references Jira Issue OCPBUGS-28647, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @shajmakh

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1129): >To fully support the usecase described in OCPBUGS-28647 and fix the issue, we need to further distinguish between first-time profile change and (in-place) profile change. > >The key distinction is if the recommended profile changes or not, and there's a desire to defer application of changes only if a profile is update, not the first time it is applied. > >Thus: >- first-time profile change is a change which triggers a change of the recommended profile >- (in-place) profile update is a change which does NOT trigger the recommended profile, and updates the setting, usually but not exclusively the sysctls. > >We change the way the annotation is used. We now require a value, which can be either >- always: every Tuned object annotated this way will have its application deferred >- update: every Tuned object annotated this way will be processed as usual (and as it wasn't annotated) if it's a first-time profile change, but its (in-place) updates will be deferred. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
yanirq commented 1 month ago

/cc @MarSik @bartwensley

ffromani commented 1 month ago

/cc @jmencak

ffromani commented 1 month ago

/retest

jmencak commented 1 month ago

Thank you for the PR! I'll try to understand the code and the motivation tomorrow. Some initial thoughts.

The key distinction is if the recommended profile changes or not, and there's a desire to defer application of changes only if a profile is update, not the first time it is applied.

Do you mean "only if profile is updated"? Also, which profile? TuneD profile or the k8s CR? I assume the former, so we should be specific to make things clear.

* (in-place) profile update is a change which does NOT trigger the recommended profile, and updates the setting, usually but not exclusively the sysctls.

IIUC, s/NOT trigger the recommended profile/NOT cause a switch to a different TuneD profile/ ?

We change the way the annotation is used. We now require a value, which can be either

* always: every Tuned object annotated this way will have its application deferred

Is this correct? Shouldn't we say something along the lines "if at least 1 Tuned object annotated this way exists, profile applications will be deferred"? Similarly for "update".

Edit: I take this back, I was thinking about the old implementation.

Also, the code uses "never" DeferMode, should we document this in the PR description and the commit?

ffromani commented 1 month ago

thanks @jmencak , I will clarify the commit message indeed.

jmencak commented 1 month ago

After some manual testing I see one potential issue with

tuned.openshift.io/deferred: "update"

1) Create the following manifest

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile
  namespace: openshift-cluster-node-tuning-operator
  annotations:
    tuned.openshift.io/deferred: "update"
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile
      include=openshift-node
      [sysctl]
      kernel.shmmni=8192
    name: openshift-profile
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile

2) Profile is applied. Now change the value of kernel.shmmni to something like kernel.shmmni=16384. Profile application is refused on the grounds of the deferred update.

3) Delete the TuneD pod responsible setting the profile => profile is applied by the new pod.

Questions:

ffromani commented 1 month ago

After some manual testing I see one potential issue with [...]

  1. Profile is applied. Now change the value of kernel.shmmni to something like kernel.shmmni=16384. Profile application is refused on the grounds of the deferred update.
  2. Delete the TuneD pod responsible setting the profile => profile is applied by the new pod. Questions:

    • Do we feel this is an issue? Should we make the information about the deferred state permanent so that TuneD pod restarts do not apply the changes?

yes, it's a bug.

* Should we get this pre-merge tested?

I guess yes

ffromani commented 1 month ago

/retest

jmencak commented 1 month ago

Infra issues /retest

jmencak commented 1 month ago

Thank you for the fix, Francesco. I can confirm the issue is no longer present. /lgtm but it would be nice to do some basic pre-merge testing to verify the functionality @liqcui . I'm sure Liquan will have questions on how to test this so this could also help to improve the docs/commit message. /hold to give other reviewers a chance to review this.

ffromani commented 1 month ago

/retest

ffromani commented 1 month ago

/retest

I still thik it's (mostly) infra issue

jmencak commented 1 month ago

/retest

I still thik it's (mostly) infra issue

Failures in basic/sysctl_d_override.go in e2e-aws-operator look suspicious. I haven't seen them in a long time. Prior to deferred updates, I tested cca 500 iterations of the entire e2e-aws-operator test suite without any issues. Let's take a break and investigate on Monday. :)

ffromani commented 1 month ago

some failures look legit (hugepages), but some other still look like infra issues

ffromani commented 1 month ago

/retest

ffromani commented 1 month ago

/retest I still thik it's (mostly) infra issue

Failures in basic/sysctl_d_override.go in e2e-aws-operator look suspicious. I haven't seen them in a long time. Prior to deferred updates, I tested cca 500 iterations of the entire e2e-aws-operator test suite without any issues. Let's take a break and investigate on Monday. :)

sorry, added my comment before reading this one. I'll abstain from further retests

jmencak commented 1 month ago

sorry, added my comment before reading this one. I'll abstain from further retests

No problem. At the moment there seem to be 4 test runs with the override test failing. This is most likely a legit issue.

/lgtm cancel based on this.

jmencak commented 1 month ago

/hold cancel

ffromani commented 1 month ago

ok, I can reproduce locally. Debugging.

ffromani commented 1 month ago

interestingly, I can reproduce even on current master, but sporadically (1 every 4 runs so far)

ffromani commented 1 month ago

ok, the problem lies in the fact that previously we didn't recover the recommended profile name and that caused by side effect a TuneD reload on NTO operand restart. Now, the question is: should actually a NTO operand restart always trigger a TuneD restart, or that was accidental?

jmencak commented 1 month ago

Now, the question is: should actually a NTO operand restart always trigger a TuneD restart, or that was accidental?

NTO operand "supervises" the TuneD daemon, so yes, that's the design. NTO operand restart will first stop TuneD and then start TuneD because the TuneD daemon will no longer be running. The NTO operand (golang code) shuts TuneD down as it shuts down.

ffromani commented 1 month ago

Now, the question is: should actually a NTO operand restart always trigger a TuneD restart, or that was accidental?

NTO operand "supervises" the TuneD daemon, so yes, that's the design. NTO operand restart will first stop TuneD and then start TuneD because the TuneD daemon will no longer be running. The NTO operand (golang code) shuts TuneD down as it shuts down.

good, This means my previous fix in this PR is wrong, but this also means I need to document that we should NOT recover the recommended profile because this prevents TuneD to be restarted. I feel it should be more explicit, but this is probably stuff for another separate PR.

jmencak commented 1 month ago

NTO operand "supervises" the TuneD daemon, so yes, that's the design. NTO operand restart will first stop TuneD and then start TuneD because the TuneD daemon will no longer be running. The NTO operand (golang code) shuts TuneD down as it shuts down.

good, This means my previous fix in this PR is wrong, but this also means I need to document that we should NOT recover the recommended profile because this prevents TuneD to be restarted. I feel it should be more explicit, but this is probably stuff for another separate PR.

I'm afraid you lost me completely, but I agree a thorough documentation is a good start. I believe we lack some usecases for the "always" and "update" annotations so that this can be pre-merge tested by QE. This could also be potentially covered by e2e tests.

ffromani commented 1 month ago

my latest push should fix the sysctl_d_override test but will break again https://github.com/openshift/cluster-node-tuning-operator/pull/1129#issuecomment-2277436661 . Working on the latter.

ffromani commented 1 month ago

the last upload should fix all the known issues.

ffromani commented 1 month ago

good: none of the current failures seems to be related to the deferred updates, whose tests all seems green! bad: a bunch of new failures to debug before this PR can move forward

ffromani commented 1 month ago

OTOH https://github.com/openshift/cluster-node-tuning-operator/pull/1131 is getting a different failure set :\

ffromani commented 1 month ago

/test e2e-gcp-pao

ffromani commented 1 month ago

/test e2e-aws-ovn-techpreview /test e2e-aws-ovn-techpreview

ffromani commented 1 month ago

/test e2e-gcp-pao-workloadhints

ffromani commented 1 month ago

/test e2e-gcp-pao-updating-profile

ffromani commented 1 month ago

/retest

jmencak commented 1 month ago

e2e-hypershift:

Looks like issues not caused by this PR

FAIL: TestCreateCluster (4091.13s)

/retest

ffromani commented 1 month ago

/hold for @MarSik and @yanirq review. (xref: https://github.com/openshift/cluster-node-tuning-operator/pull/1129#pullrequestreview-2234947331)

jmencak commented 1 month ago

Thank you for the changes. /lgtm Found one typo change afeter. but that can be fixed later on if you prefer.

ffromani commented 1 month ago

Thank you for the changes. /lgtm Found one typo change afeter. but that can be fixed later on if you prefer.

thanks, fixed

jmencak commented 1 month ago

/lgtm

openshift-ci[bot] commented 1 month ago

@ffromani: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [MarSik,ffromani] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jmencak commented 1 month ago

/unhold

openshift-ci-robot commented 1 month ago

@ffromani: Jira Issue OCPBUGS-28647: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-28647 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1129): >To fully support the usecase described in OCPBUGS-28647 and fix >the issue, we need to further distinguish between first-time >profile change and (in-place) profile change. This is required to >better support a GitOps flow. > >The key distinction is if the recommended profile changes or not, >and there's a desire to defer application of changes only when a >profile is updated (e.g. sysctl modified), not the first time it >is applied. > >Thus: >- first-time profile change is a change which triggers a change of the recommended profile. >- (in-place) profile update is a change which does NOT cause a switch to a different TuneD profile. This usually, but not exclusively, involve changes of the sysctls. > >We change the way the annotation is used. We now require a value, >which can be either >- always: every Tuned object annotated this way will have its application deferred >- update: every Tuned object annotated this way will be processed as usual (and as it wasn't annotated) if it's a first-time profile change, but its (in-place) updates will be deferred. >- a new internal value "never" is also added to be used internally to mean the deferred feature is disabled entirely. User can use this value but it will explicitly disable the feature (which is disabled already by default), thus is redundant and not recommended. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202408140944.p0.g3655f22.assembly.stream.el9. All builds following this will include this PR.

yanirq commented 4 weeks ago

/cherry-pick release-4.17

openshift-cherrypick-robot commented 4 weeks ago

@yanirq: new pull request created: #1138

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1129#issuecomment-2298893689): >/cherry-pick release-4.17 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.