DNM:debug 1_performance suite

Tal-or commented 4 months ago

/hold DNM!

openshift-ci[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Tal-or Once this PR has been reviewed and has the lgtm label, please assign ffromani for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift/cluster-node-tuning-operator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

Tal-or commented 4 months ago

/test ci/prow/e2e-hypershift-pao

openshift-ci[bot] commented 4 months ago

@Tal-or: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

/test e2e-aws-operator
/test e2e-aws-ovn
/test e2e-aws-ovn-techpreview
/test e2e-gcp-pao
/test e2e-gcp-pao-updating-profile
/test e2e-gcp-pao-workloadhints
/test e2e-hypershift
/test e2e-hypershift-pao
/test e2e-no-cluster
/test e2e-upgrade
/test images
/test unit
/test verify
/test vet

The following commands are available to trigger optional jobs:

/test e2e-telco5g-cnftests
/test lint
/test okd-scos-images

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-node-tuning-operator-master-e2e-aws-operator
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-aws-ovn
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-aws-ovn-techpreview
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-gcp-pao
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-gcp-pao-updating-profile
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-gcp-pao-workloadhints
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-hypershift
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-hypershift-pao
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-no-cluster
pull-ci-openshift-cluster-node-tuning-operator-master-e2e-upgrade
pull-ci-openshift-cluster-node-tuning-operator-master-images
pull-ci-openshift-cluster-node-tuning-operator-master-lint
pull-ci-openshift-cluster-node-tuning-operator-master-unit
pull-ci-openshift-cluster-node-tuning-operator-master-verify
pull-ci-openshift-cluster-node-tuning-operator-master-vet

In response to [this](https://github.com/openshift/cluster-node-tuning-operator/pull/1102#issuecomment-2208809332): >/test ci/prow/e2e-hypershift-pao Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

Note to self: For some reason which is not clear, the testpod has two nodeSelectors:

NodeSelector: {
            kubernetes.io/hostname: ip-10-0-129-17.ec2.internal,
            node-role.kubernetes.io/worker-cnf: ""
        },

Now, the node (for another reason which I don't understand) does not contains the worker-cnf label (although it gets labeled at the beginning of the test). Checking carefully I see the node that was labeled at the beginning has gone during the test runtime, so the node the pod is scheduled into, does not contain the label anymore. The coincidence of: A. A node which was removed from a cluster. B. An addition if undesired nodeSelector for the testpod is the cause for this bug. All that "left" is to figure out what causes A and B

Tal-or commented 4 months ago

/test e2e-hypershift-pao

Tal-or commented 4 months ago

/test e2e-gcp-pao

openshift-ci[bot] commented 4 months ago

@Tal-or: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-pao	d0850b3df25afec98a117105f9bcbe2a21bf0122	link	true	`/test e2e-gcp-pao`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).

rbaturov commented 4 months ago

Note to self: For some reason which is not clear, the testpod has two nodeSelectors:
NodeSelector: {
            kubernetes.io/hostname: ip-10-0-129-17.ec2.internal,
            node-role.kubernetes.io/worker-cnf: ""
        },
Now, the node (for another reason which I don't understand) does not contains the worker-cnf label (although it gets labeled at the beginning of the test). Checking carefully I see the node that was labeled at the beginning has gone during the test runtime, so the node the pod is scheduled into, does not contain the label anymore. The coincidence of: A. A node which was removed from a cluster. B. An addition if undesired nodeSelector for the testpod is the cause for this bug. All that "left" is to figure out what causes A and B

Seems like the worker-cnf has been appended by the API server. definitely, we didn't append this label to the pod in any phase of the pod construction.

Tal-or commented 4 months ago

Note to self: For some reason which is not clear, the testpod has two nodeSelectors:
NodeSelector: {
            kubernetes.io/hostname: ip-10-0-129-17.ec2.internal,
            node-role.kubernetes.io/worker-cnf: ""
        },
Now, the node (for another reason which I don't understand) does not contains the worker-cnf label (although it gets labeled at the beginning of the test). Checking carefully I see the node that was labeled at the beginning has gone during the test runtime, so the node the pod is scheduled into, does not contain the label anymore. The coincidence of: A. A node which was removed from a cluster. B. An addition if undesired nodeSelector for the testpod is the cause for this bug. All that "left" is to figure out what causes A and B
Seems like the worker-cnf has been appended by the API server. definitely, we didn't append this label to the pod in any phase of the pod construction.

Indeed, but atm we should first make sure that after every time that performance profile is being applied we'll restore the labels for the nodes. IOW set the worker-cnf label. This will fix the bug and we'll handle with the investigation of the appended node selector to the pod later.

Tal-or commented 4 months ago

/test e2e-hypershift-pao

openshift-merge-robot commented 4 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

Tal-or commented 4 months ago

https://github.com/openshift/cluster-node-tuning-operator/pull/1084 got merged

openshift / cluster-node-tuning-operator

DNM:debug 1_performance suite #1102