openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 105 forks source link

OCPBUGS-41487: E2E: Add hypershift support to workloadhints testsuite #1154

Closed mrniranjan closed 1 day ago

mrniranjan commented 2 months ago
mrniranjan commented 2 months ago

/test e2e-upgrade

openshift-ci-robot commented 2 months ago

@mrniranjan: This pull request references Jira Issue OCPBUGS-41487, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @mrniranjan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1154): >- Changes primarily done to check nodepools instead of mcp for hypershift. >- replace dataplane test client instead of generic testclient.Client Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 2 months ago

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: mrniranjan.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1154#issuecomment-2336764300): >@mrniranjan: This pull request references [Jira Issue OCPBUGS-41487](https://issues.redhat.com//browse/OCPBUGS-41487), which is valid. The bug has been moved to the POST state. > >
3 validation(s) were run on this bug > >* bug is open, matching expected state (open) >* bug target version (4.18.0) matches configured target version for branch (4.18.0) >* bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @mrniranjan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1154): >- Changes primarily done to check nodepools instead of mcp for hypershift. >- replace dataplane test client instead of generic testclient.Client Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Tal-or commented 2 months ago

@mrniranjan Please add the suite to the make target, because currently it's not running.

mrniranjan commented 1 month ago

/retest-required

Tal-or commented 1 month ago

/lgtm

Thanks!

shajmakh commented 4 weeks ago

/approve

MarSik commented 4 weeks ago

/approved

openshift-ci-robot commented 4 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 6d2e1edef5b898f63617b1cdba589b3083b87331 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 4 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 6d2e1edef5b898f63617b1cdba589b3083b87331 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 4 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 6d2e1edef5b898f63617b1cdba589b3083b87331 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 4 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 2e07e3ab309a29dbf4a147c76869291e8fb1e350 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 4 weeks ago

/retest-required

Remaining retests: 0 against base HEAD 2e07e3ab309a29dbf4a147c76869291e8fb1e350 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

Tal-or commented 3 weeks ago

Seems like actual failure

  > Enter [BeforeEach] [rfe_id:49062][workloadHints] Telco friendly workload specific PerformanceProfile API - /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/8_performance_workloadhints/workloadhints.go:61 @ 10/27/24 11:59:26.333
I1027 11:59:26.358328   20478 workloadhints.go:859] updated nodes from map[string]string{"node-role.kubernetes.io/worker-cnf":""}: []
I1027 11:59:26.358350   20478 workloadhints.go:861] updated nodes matching optional selector: []
  [FAILED] cannot find RT enabled worker nodes
  Expected
      <[]v1.Node | len:0, cap:0>: nil
  not to be empty
  In [BeforeEach] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/8_performance_workloadhints/workloadhints.go:863 @ 10/27/24 11:59:2
openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD a98b16a6c6d9e1f0fc575fba137d0ffd22178f0b and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD bc3ecaea5131f120ef8a282039c3ffd013cb8a76 and 1 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD bc3ecaea5131f120ef8a282039c3ffd013cb8a76 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD bc3ecaea5131f120ef8a282039c3ffd013cb8a76 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

openshift-ci-robot commented 3 weeks ago

/retest-required

Remaining retests: 0 against base HEAD bc3ecaea5131f120ef8a282039c3ffd013cb8a76 and 2 for PR HEAD f16d3d727d007174db9786e85bdad86e9c28de5f in total

mrniranjan commented 3 weeks ago

/hold

mrniranjan commented 3 weeks ago
  STEP: Waiting for TuneD to start on nodes - /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/8_performance_workloadhints/workloadhints.go:123 @ 10/30/24 10:52:22.36
  [FAILED] Unexpected error:
      <*errors.errorString | 0xc000223300>: 
      failed to find a TuneD Pod for node ip-10-0-142-236.ec2.internal
      {
          s: "failed to find a TuneD Pod for node ip-10-0-142-236.ec2.internal",
      }
  occurred
  In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/8_performance_workloadhints/workloadhints.go:132 @ 10/30/24 10:57:22.476

@Tal-or why doesn't the tuned pod start on the HCP nodes ?, when i execute the tests locally i see tuned pod running and tests proceed

mrniranjan commented 2 weeks ago

/test e2e-hypershift-pao

Tal-or commented 2 weeks ago

We reach the jira API quota so it failed on the CPU test again

  Nov  6 23:14:39.818: [WARNING]: failed to retrieve status of Jira issue OCPBUGS-43280: failed to get jira status of OCPBUGS-43280: 429 429 Too Many Requests
  STEP: fetch Default cpu set from cpu manager state file before restart - /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:312 @ 11/06/24 23:14:39.818
  Nov  6 23:14:39.821: [INFO]: daemonset "node-inspector-ns" "node-inspector" desired 3 scheduled 3 ready 3
  Nov  6 23:14:39.825: [INFO]: found daemon pod node-inspector-m22vt for node ip-10-0-130-199.ec2.internal
cpuset =  0,2
  Nov  6 23:14:39.882: [INFO]: pre kubelet restart default cpuset: 0,2
  Nov  6 23:14:39.885: [INFO]: daemonset "node-inspector-ns" "node-inspector" desired 3 scheduled 3 ready 3
  Nov  6 23:14:39.889: [INFO]: found daemon pod node-inspector-m22vt for node ip-10-0-130-199.ec2.internal
  Nov  6 23:15:39.979: [INFO]: post kubele restart: waiting for node "ip-10-0-130-199.ec2.internal": to be ready
  Nov  6 23:15:39.985: [INFO]: node "ip-10-0-130-199.ec2.internal" ready=true
  Nov  6 23:15:39.985: [INFO]: post kubele restart: node "ip-10-0-130-199.ec2.internal": reported ready
  Nov  6 23:15:39.985: [INFO]: post restart: entering cooldown time: 1m0s
  Nov  6 23:16:39.985: [INFO]: post restart: finished cooldown time: 1m0s
  STEP: fetch Default cpuset from cpu manager state after restart - /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 11/06/24 23:16:39.985
  Nov  6 23:16:39.990: [INFO]: daemonset "node-inspector-ns" "node-inspector" desired 3 scheduled 3 ready 3
  Nov  6 23:16:39.993: [INFO]: found daemon pod node-inspector-m22vt for node ip-10-0-130-199.ec2.internal
cpuset =  0-3
  [FAILED] Expected
      <cpuset.CPUSet>: {
          elems: {0: {}, 2: {}},
      }
  to equal
      <cpuset.CPUSet>: {
          elems: {1: {}, 2: {}, 3: {}, 0: {}},
mrniranjan commented 1 week ago

/unhold

mrniranjan commented 1 week ago

/test e2e-hypershift-pao

Tal-or commented 1 week ago

Reached timeout again. Too many tests are running on the same lane. On OCP we're having a separate lane for the workloadhints but it seems too much here (on HCP).

@mrniranjan can we classified the workloadhints tests and run only the most critical ones in order to cut tests execution time?

mrniranjan commented 1 week ago

Reached timeout again. Too many tests are running on the same lane. On OCP we're having a separate lane for the workloadhints but it seems too much here (on HCP).

@mrniranjan can we classified the workloadhints tests and run only the most critical ones in order to cut tests execution time?

I have classified each workload hints test as tier3 and 1 test as tier-0, Modified the Makefile to skip all tests with workload-hints and tier-3 , so executing only 1 tier-0 tests.

mrniranjan commented 1 week ago

/test okd-scos-e2e-aws-ovn

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak, MarSik, mrniranjan, shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [MarSik,jmencak] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
mrniranjan commented 6 days ago

/retest-required

mrniranjan commented 6 days ago

/retest-required

mrniranjan commented 5 days ago

/retest-required

mrniranjan commented 5 days ago

/unhold

mrniranjan commented 4 days ago

/retest-required

Tal-or commented 4 days ago

/retest

mrniranjan commented 4 days ago

/retest-required

mrniranjan commented 2 days ago

/test okd-scos-e2e-aws-ovn

mrniranjan commented 2 days ago

/retest-required

Tal-or commented 2 days ago
[FAILED] Unexpected error:
      <*fmt.wrapError | 0xc0002a2b20>: 
      failed to run command [/bin/sh -c tuned-adm profile_info openshift-node-performance-performance 2>/dev/null | grep ^openshift-]: output ""; error ""; command terminated with exit code 1
      {
          msg: "failed to run command [/bin/sh -c tuned-adm profile_info openshift-node-performance-performance 2>/dev/null | grep ^openshift-]: output \"\"; error \"\"; command terminated with exit code 1",
          err: <exec.CodeExitError>{
              Err: <*errors.errorString | 0xc000437[220](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-node-tuning-operator/1154/pull-ci-openshift-cluster-node-tuning-operator-master-e2e-hypershift-pao/1859177980660551680#1:build-log.txt%3A220)>{
                  s: "command terminated with exit code 1",
              },
              Code: 1,
          },
      }
  occurred

Might be a flake, lets follow

openshift-ci[bot] commented 2 days ago

@mrniranjan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
mrniranjan commented 1 day ago

@Tal-or can you have a look

Tal-or commented 1 day ago

/lgtm /label acknowledge-critical-fixes-only test code, not going to be part of OCP core payload

openshift-ci-robot commented 1 day ago

@mrniranjan: Jira Issue OCPBUGS-41487: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-41487 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1154): >- Changes primarily done to check nodepools instead of mcp for hypershift. >- replace dataplane test client instead of generic testclient.Client Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.