openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 105 forks source link

NO-JIRA: Add a script to test internal TuneD FDP releases #1167

Closed jmencak closed 2 months ago

jmencak commented 2 months ago

This change adds a script to test internal TuneD FDP releases hack/test-tuned-fdp.sh. The script builds on top of hack/deploy-custom-nto.sh script, but it makes use of Dockerfile.rhel9 versus the default upstream Dockerfile.

For detail on how to use it, refer to "Example invocation" section within the script. At a minimum, you'll typically want to supply your quay.io username and baseurl to the TuneD FDP repository.

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [jmencak] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jmencak commented 2 months ago

WiP because I'm still testing the script, but it should work as is. @yanirq , the pao-functests-updating-profile tests are really time-consuming. Do we need them as an blocker of TuneD FDP releases too or is running pao-functests sufficient? Thank you.

/cc @liqcui Liquan, could you please take this script for a spin to see if it works for you? Thank you!

jmencak commented 2 months ago

/retest

jmencak commented 2 months ago

/hold Some testers are using this on arm64 MACs. Let me see if I can help them to support that environment by documenting/adjusting this.

jmencak commented 2 months ago

/hold cancel Builds from Apple MAC hardware should work now.

openshift-ci-robot commented 2 months ago

@jmencak: This pull request explicitly references no jira issue.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1167): >This change adds a script to test internal TuneD FDP releases >`hack/test-tuned-fdp.sh`. The script builds on top of >`hack/deploy-custom-nto.sh` script, but it makes use of `Dockerfile.rhel9` >versus the default upstream `Dockerfile`. > >For detail on how to use it, refer to "Example invocation" section within >the script. At a minimum, you'll typically want to supply your quay.io >username and baseurl to the TuneD FDP repository. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jmencak commented 2 months ago

/hold There are still issues with testing from non x86_64 platforms. Will likely need to adjust the test targets in the Makefile. Thank you Liquan for testing this.

Edit: however, works for me with

export GOARCH=amd64
ORG=jmencak hack/test-tuned-fdp.sh

on an arm64 box.

jmencak commented 2 months ago

Lifting the hold /hold cancel Tests work for me from arm64 box using the following:

export GOARCH=amd64
ORG=jmencak hack/test-tuned-fdp.sh

The pao-functests-updating-profile, however, didn't completely succeed this time.

  [FAILED] in [It] - /root/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go:117 @ 09/20/24 09:49:51.213
• [FAILED] [31.720 seconds]
[performance] Cgroups and affinity [rfe_id: 64006][Dynamic OVS Pinning] [Performance Profile applied] [It] [test_id:73046] Verify ovn kube node pod have their cpuset.cpus set to all available cpus [ovs-pinning, tier-0]
/root/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go:110

  [FAILED] Unexpected error:
      <*json.SyntaxError | 0xc000b3e708>: 
      unexpected end of JSON input
      {
          msg: "unexpected end of JSON input",
          Offset: 9301,
      }
  occurred
  In [It] at: /root/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go:117 @ 09/20/24 09:49:51.213

  There were additional failures detected.  To view them in detail run ginkgo -vv

@yanirq do we really need the pao-functests-updating-profile tests to test new TuneD FDP releases or are pao-functests sufficient?

jmencak commented 2 months ago

I'm seeing the

  [FAILED] Unexpected error:
      <*json.SyntaxError | 0xc000b3e708>: 
      unexpected end of JSON input
      {
          msg: "unexpected end of JSON input",
          Offset: 9301,
      }
  occurred

consistently when running the tests completely from aarch64 platform. However, when using the cross-compiled NTO image and running them from my x86_64 Fedora 40, the tests pass without any issue. I wonder if the culprit could be different go/ginkgo versions I have on my x86_64 vs. aarch64 platform. Will retest.

yanirq commented 2 months ago

Lifting the hold /hold cancel Tests work for me from arm64 box using the following:

export GOARCH=amd64
ORG=jmencak hack/test-tuned-fdp.sh

The pao-functests-updating-profile, however, didn't completely succeed this time.

  [FAILED] in [It] - /root/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go:117 @ 09/20/24 09:49:51.213
• [FAILED] [31.720 seconds]
[performance] Cgroups and affinity [rfe_id: 64006][Dynamic OVS Pinning] [Performance Profile applied] [It] [test_id:73046] Verify ovn kube node pod have their cpuset.cpus set to all available cpus [ovs-pinning, tier-0]
/root/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go:110

  [FAILED] Unexpected error:
      <*json.SyntaxError | 0xc000b3e708>: 
      unexpected end of JSON input
      {
          msg: "unexpected end of JSON input",
          Offset: 9301,
      }
  occurred
  In [It] at: /root/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go:117 @ 09/20/24 09:49:51.213

  There were additional failures detected.  To view them in detail run ginkgo -vv

@yanirq do we really need the pao-functests-updating-profile tests to test new TuneD FDP releases or are pao-functests sufficient?

pao-functests should suffice

yanirq commented 2 months ago

/cc @fontivan

jmencak commented 2 months ago

@yanirq do we really need the pao-functests-updating-profile tests to test new TuneD FDP releases or are pao-functests sufficient?

pao-functests should suffice

Thank you. Omitting them, but I'll try to get to the botom of the failures nevertheless.

jmencak commented 2 months ago

/retest

yanirq commented 2 months ago

/retest

jmencak commented 2 months ago

@liqcui finally managed to test on his MAC system. However, he needed the following modification. Note that setting the GOOS variable explicitly did not help. Adding this change to another PR so that we can easily roll-back if there are issues with https://github.com/openshift/cluster-node-tuning-operator/pull/1172 without removing this script. My recommendation would be to run the test script without cross-compilation on a Linux x86_64 system anyway.

jmencak commented 2 months ago

/retest

jmencak commented 2 months ago

I'm seeing the

  [FAILED] Unexpected error:
      <*json.SyntaxError | 0xc000b3e708>: 
      unexpected end of JSON input
      {
          msg: "unexpected end of JSON input",
          Offset: 9301,
      }
  occurred

consistently when running the tests completely from aarch64 platform. However, when using the cross-compiled NTO image and running them from my x86_64 Fedora 40, the tests pass without any issue. I wonder if the culprit could be different go/ginkgo versions I have on my x86_64 vs. aarch64 platform. Will retest.

https://github.com/openshift/cluster-node-tuning-operator/pull/1174 fixes this issue. Note that we do not use pao-functests-updating-profile for FDP testing anyway. This PR is ready for review.

liqcui commented 2 months ago

/lgtm

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD 49128af3ad47af141331a507e04de1bca42951ca and 2 for PR HEAD 8896add56904d5b90bd7454f7ea6ee960280032d in total

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD 49128af3ad47af141331a507e04de1bca42951ca and 2 for PR HEAD 8896add56904d5b90bd7454f7ea6ee960280032d in total

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD 49128af3ad47af141331a507e04de1bca42951ca and 2 for PR HEAD 8896add56904d5b90bd7454f7ea6ee960280032d in total

jmencak commented 2 months ago

e2e-gcp-pao test is failing due to a recent MCO change. The test added in this PR has nothing to do with the failures. /override ci/prow/e2e-gcp-pao

openshift-ci[bot] commented 2 months ago

@jmencak: Overrode contexts on behalf of jmencak: ci/prow/e2e-gcp-pao

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1167#issuecomment-2374154321): >e2e-gcp-pao test is failing due to a [recent MCO change](https://github.com/openshift/machine-config-operator/pull/4557). >The test added in this PR has nothing to do with the failures. >/override ci/prow/e2e-gcp-pao Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 2 months ago

@jmencak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 2 months ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202409251710.p0.gdf5dd71.assembly.stream.el9. All builds following this will include this PR.