openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 105 forks source link

CNF-11815: e2e: Added node inspector for inspecting nodes configuration #1008

Closed rbaturov closed 5 months ago

rbaturov commented 7 months ago

On hypershift there is no MCO, hence there are no machine-config-daemon pods. A different resolution is needed for accessing the underlying node for inspecting configurations. This commit introduces a node inspector implemented as a daemonset. Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node. Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup. This API will be used for both hypershift and non-hypershift systems.

openshift-ci-robot commented 7 months ago

@rbaturov: This pull request references CNF-11815 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1008): >On hypershift there is no MCO, hence there are no machine-config-daemon pods. A different resolution is needed for accessing the underlying node for inspecting configurations. This commit adds a daemonset that will start upon test suites - on every node we will deploy a pod with escalated privileges and host fs mounted. This API will be used for both hypershift and non-hypershift systems. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
rbaturov commented 7 months ago

/hold Depends on https://github.com/openshift/cluster-node-tuning-operator/pull/1004

openshift-ci-robot commented 7 months ago

@rbaturov: This pull request references CNF-11815 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1008): >On hypershift there is no MCO, hence there are no machine-config-daemon pods. >A different resolution is needed for accessing the underlying node for inspecting configurations. >This commit introduces a node inspector implemented as a daemonset. >Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node. >Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup. >This API will be used for both hypershift and non-hypershift systems. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-node-tuning-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
rbaturov commented 7 months ago

/retest-required

rbaturov commented 7 months ago

@Tal-or ready for another review iteration

rbaturov commented 6 months ago

/unhold

rbaturov commented 5 months ago

/retest-required

openshift-ci[bot] commented 5 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, rbaturov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)~~ [ffromani] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
rbaturov commented 5 months ago

@Tal-or @ffromani Did some modifications:

  1. Calling Z_deconfig as part of a trap command in the run-test.sh script will ensure cleanup occurs, even if we run through failures mid-way.
  2. Changed the node-inspector delete function, to delete namespace (which will implicitly delete service account, cluster-role-binding), thereby reducing API calls. Clusterrole requires an additional delete request as it is not namespaced.
rbaturov commented 5 months ago

/retest-required

rbaturov commented 5 months ago

/retest-required

rbaturov commented 5 months ago

/retest-required

rbaturov commented 5 months ago

/retest-required

openshift-ci[bot] commented 5 months ago

@rbaturov: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
Tal-or commented 5 months ago

/lgtm Thank you for all your work on this

ffromani commented 5 months ago

/label acknowledge-critical-fixes-only test-only code and this is an improvement anyway. Plus, we depend on this work for a critical feature