openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 105 forks source link

CNF-14090: Add vendor and architecture specific tuning options #1083

Closed fontivan closed 1 month ago

fontivan commented 5 months ago
openshift-ci-robot commented 5 months ago

@fontivan: This pull request references CNF-13014 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1083): >- This makes use of a new helper function in tuned that must first be upstreamed >- These changes will likely be split and submitted in smaller chunks however for demonstration purposes they are all shown together here Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 5 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

fontivan commented 5 months ago

I think this will work pretty well once the new lscpu helper is upstreamed (https://github.com/redhat-performance/tuned/pull/638) but I will need to figure out how to make the tests happy with this change.

It seems like the tests are not evaluating the include statement, so it is not seeing the results it is expecting. I'm seeing 6 test failures locally with the current version of the PR

MarSik commented 5 months ago

@fontivan You need to do couple of things. First, deploy the new files to the cluster. Like here https://github.com/openshift/cluster-node-tuning-operator/blob/master/pkg/performanceprofile/controller/performanceprofile/components/tuned/tuned.go#L228

And then run make render-sync to refresh the expected output files (and check them of course).

fontivan commented 5 months ago

@fontivan You need to do couple of things. First, deploy the new files to the cluster. Like here https://github.com/openshift/cluster-node-tuning-operator/blob/master/pkg/performanceprofile/controller/performanceprofile/components/tuned/tuned.go#L228

And then run make render-sync to refresh the expected output files (and check them of course).

I'm guessing I'm going to at least need the new helper merged upstreamed so that I can use the TUNED_COMMIT option described in HACKING.md to accomplish this ?

MarSik commented 5 months ago

That is for building the NTO container and the full cluster tests.

But the initial test you can do is via make test-e2e-local and that does not need any cluster or tuned.

openshift-ci-robot commented 3 months ago

@fontivan: This pull request references CNF-14090 which is a valid jira issue.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1083): >- This makes use of a new helper function in tuned that must first be upstreamed >- These changes will likely be split and submitted in smaller chunks however for demonstration purposes they are all shown together here Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
fontivan commented 1 month ago

/hold do not merge yet

fontivan commented 1 month ago

/retest

fontivan commented 1 month ago

/retest

MarSik commented 1 month ago

@fontivan Please squash at least my fix-up patches with something so this gets ready for merge. I will review the CI.

fontivan commented 1 month ago

@fontivan Please squash at least my fix-up patches with something so this gets ready for merge. I will review the CI.

Will squash into 2 commits (1 for the work, 1 for the render sync) once I have things working correctly

fontivan commented 1 month ago

/retest

MarSik commented 1 month ago

/retest-required

MarSik commented 1 month ago

/lgtm /approve

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fontivan, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [MarSik] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
MarSik commented 1 month ago

/retest-required

openshift-ci-robot commented 1 month ago

@fontivan: This pull request references CNF-14090 which is a valid jira issue.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1083): >- This makes use of a new helper function in tuned that was added in tuned 2.24 to get cpu information from lscpu >- Adds separate tuning files for AMD/X86, ARM/AArch64, and Intel/X86 >- Platform specific tuning can be added to the appropriate tuning file and will loaded automatically by the openshift-node-performance profile >- Tuning for AMD and ARM should be considered not final at this time Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 1 month ago

@fontivan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
MarSik commented 1 month ago

/lgtm

MarSik commented 1 month ago

/hold cancel

MarSik commented 1 month ago

/cherry-pick release-4.17 release-4.16

openshift-cherrypick-robot commented 1 month ago

@MarSik: #1083 failed to apply on top of branch "release-4.17":

Applying: CNF-14090: Add vendor and architecture specific tuning options - Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change - When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning - This makes use of a new helper function added to tuned to detect the system name and architecture - Update unit tests to account for the various changes - Add new unit tests to cover the platform specific tuning
Applying: CNF-14090: Re-sync e2e test yaml for tuning changes
Applying: CNF-14090: Use variable composition for idle_poll - idle=poll is only supported on x86 - Update tests to account for changes - Add explaination comments to empty values in openshift-node-performance
Using index info to reconstruct a base tree...
M   assets/performanceprofile/tuned/openshift-node-performance
M   assets/tuned/tuned
Falling back to patching base and 3-way merge...
Failed to merge submodule assets/tuned/tuned (not checked out)
Auto-merging assets/tuned/tuned
CONFLICT (submodule): Merge conflict in assets/tuned/tuned
Auto-merging assets/performanceprofile/tuned/openshift-node-performance
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0003 CNF-14090: Use variable composition for idle_poll - idle=poll is only supported on x86 - Update tests to account for changes - Add explaination comments to empty values in openshift-node-performance
In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1083#issuecomment-2428616359): >/cherry-pick release-4.17 release-4.16 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
MarSik commented 1 month ago

Ouch.. I missed the submodule update that does not belong here.

MarSik commented 1 month ago

Ok, manual cherry pick it is :)

openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202410221110.p0.g8b9e5a0.assembly.stream.el9. All builds following this will include this PR.