openshift / cluster-node-tuning-operator

Manage node-level tuning by orchestrating the tuned daemon.
Apache License 2.0
102 stars 105 forks source link

DNM:debug 1_performance suite #1102

Closed Tal-or closed 4 months ago

Tal-or commented 4 months ago

/hold DNM!

openshift-ci[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Tal-or Once this PR has been reviewed and has the lgtm label, please assign ffromani for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Tal-or commented 4 months ago

/test ci/prow/e2e-hypershift-pao

openshift-ci[bot] commented 4 months ago

@Tal-or: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1102#issuecomment-2208809332): >/test ci/prow/e2e-hypershift-pao Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

Note to self: For some reason which is not clear, the testpod has two nodeSelectors:

NodeSelector: {
            kubernetes.io/hostname: ip-10-0-129-17.ec2.internal,
            node-role.kubernetes.io/worker-cnf: ""
        },

Now, the node (for another reason which I don't understand) does not contains the worker-cnf label (although it gets labeled at the beginning of the test). Checking carefully I see the node that was labeled at the beginning has gone during the test runtime, so the node the pod is scheduled into, does not contain the label anymore. The coincidence of: A. A node which was removed from a cluster. B. An addition if undesired nodeSelector for the testpod is the cause for this bug. All that "left" is to figure out what causes A and B

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-gcp-pao

openshift-ci[bot] commented 4 months ago

@Tal-or: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-pao d0850b3df25afec98a117105f9bcbe2a21bf0122 link true /test e2e-gcp-pao

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
rbaturov commented 4 months ago

Note to self: For some reason which is not clear, the testpod has two nodeSelectors:

NodeSelector: {
            kubernetes.io/hostname: ip-10-0-129-17.ec2.internal,
            node-role.kubernetes.io/worker-cnf: ""
        },

Now, the node (for another reason which I don't understand) does not contains the worker-cnf label (although it gets labeled at the beginning of the test). Checking carefully I see the node that was labeled at the beginning has gone during the test runtime, so the node the pod is scheduled into, does not contain the label anymore. The coincidence of: A. A node which was removed from a cluster. B. An addition if undesired nodeSelector for the testpod is the cause for this bug. All that "left" is to figure out what causes A and B

Seems like the worker-cnf has been appended by the API server. definitely, we didn't append this label to the pod in any phase of the pod construction.

Tal-or commented 4 months ago

Note to self: For some reason which is not clear, the testpod has two nodeSelectors:

NodeSelector: {
            kubernetes.io/hostname: ip-10-0-129-17.ec2.internal,
            node-role.kubernetes.io/worker-cnf: ""
        },

Now, the node (for another reason which I don't understand) does not contains the worker-cnf label (although it gets labeled at the beginning of the test). Checking carefully I see the node that was labeled at the beginning has gone during the test runtime, so the node the pod is scheduled into, does not contain the label anymore. The coincidence of: A. A node which was removed from a cluster. B. An addition if undesired nodeSelector for the testpod is the cause for this bug. All that "left" is to figure out what causes A and B

Seems like the worker-cnf has been appended by the API server. definitely, we didn't append this label to the pod in any phase of the pod construction.

Indeed, but atm we should first make sure that after every time that performance profile is being applied we'll restore the labels for the nodes. IOW set the worker-cnf label. This will fix the bug and we'll handle with the investigation of the appended node selector to the pod later.

Tal-or commented 4 months ago

/test e2e-hypershift-pao

openshift-merge-robot commented 4 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
Tal-or commented 4 months ago

https://github.com/openshift/cluster-node-tuning-operator/pull/1084 got merged