kubevirt / kubevirtci

Contains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing
Apache License 2.0
79 stars 119 forks source link

Remove NoSchedule taints for control-planes as well as masters #1173

Closed iholder101 closed 4 months ago

iholder101 commented 4 months ago

What this PR does / why we need it: Removes the NoSchedule taints for control-planes as well as masters.

Special notes for your reviewer: This PR https://github.com/kubevirt/kubevirt/pull/11659 is requiring Kubevirt's infra components to run on CP nodes by default. However, the pull-kubevirt-e2e-kind-sriov lane is failing since its single CP node has a NoShedule taint:

"taints": [
    {
        "key": "node-role.kubernetes.io/control-plane",
        "effect": "NoSchedule"
    }
]

After this PR this taint should not exist.

A PR to test these changes: https://github.com/kubevirt/kubevirt/pull/11730.

Release note:

Remove NoSchedule taints for control-planes as well as masters
iholder101 commented 4 months ago

/cc @brianmcarey @dhiller

oshoval commented 4 months ago

/cc @EdDev @ormergi

can you use the kind non sriov lane instead ? (there is one) - however according PR desc, sriov needs it as well did you check it with sriov lane on kubevirt with all the e2e tests ? Please note this lane is maintained by sig-network so sig-network should be in the loop

There was a try in the past to do something similiar that broke sriov https://github.com/kubevirt/kubevirtci/commit/2e998c2fa81de4615aa1ca0b2c47fe6387be023e but it is not the same

iholder101 commented 4 months ago

/cc @EdDev @ormergi

can you use the kind non sriov lane instead ? (there is one) - however according PR desc, sriov needs it as well

Not sure I understand the question. pull-kubevirt-e2e-kind-sriov is a gating presubmit lane that blocks https://github.com/kubevirt/kubevirt/pull/11659.

Is there a reason to keep these taints for this lane?

did you check it with sriov lane on kubevirt with all the e2e tests ?

Do you mean locally? Is there a way to test it via CI?

Please note this lane is maintained by sig-network so sig-network should be in the loop

Thanks for pinging them!

oshoval commented 4 months ago

The change looks fine kubevirtci runs all the sriov tests beside 2 (the ones that need cpu management) I don't think this change will affect them, we can take the chance and revert if needed unless you want to spin a cluster locally on a machine that has SRIOV and test it at kubevirt repo

You can also create a PR that bump it into kubevirt even before the offical bump and test it on kubevirt just copy the file into a kubevirt and thats it running this on kubevirt and creating a dummy PR will do curl -L https://github.com/kubevirt/kubevirtci/pull/1173.patch | git am

It reminded me a change that is breaking sriov https://github.com/kubevirt/kubevirtci/commit/2e998c2fa81de4615aa1ca0b2c47fe6387be023e but this is not the same, so it should be fine

Thanks

oshoval commented 4 months ago

vgpu lane failure not related right ? (also uses same kind file)

@zhlhahaha FYI please about this PR, as it affects ARM lane Thanks

iholder101 commented 4 months ago

You can also create a PR that bump it into kubevirt even before the offical bump and test it on kubevirt just copy the file into a kubevirt and thats it running this on kubevirt and creating a dummy PR will do curl -L https://github.com/kubevirt/kubevirtci/pull/1173.patch | git am

Wow man, that's really cool! Thanks! Here's the test PR: https://github.com/kubevirt/kubevirt/pull/11730. I'll also add it to the PR description

iholder101 commented 4 months ago

/test check-up-kind-1.27-vgpu

zhlhahaha commented 4 months ago

vgpu lane failure not related right ? (also uses same kind file)

@zhlhahaha FYI please about this PR, as it affects ARM lane Thanks

Hi, @oshoval thank for reminding! /lgtm

iholder101 commented 4 months ago

kind tests have passed here: https://github.com/kubevirt/kubevirt/pull/11730. Anything else missing? @dhiller @brianmcarey

kubevirt-bot commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brianmcarey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubevirt/kubevirtci/blob/main/OWNERS)~~ [brianmcarey] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
oshoval commented 4 months ago

it is already lgtmed please just make sure the vgpu flake happens even without this PR and then it is fine (i believe vgpu flake isn't related, but best double check if possible please)

brianmcarey commented 4 months ago

st make sure the vgpu flake happens even without this PR and then it is fine (i believe vgpu flake isn't related, but best double check if possible please)

Yeah the vgpu lane has issues here that are unrelated to this PR. Will try to take a look at it in the next few days.

kubevirt-bot commented 4 months ago

@iholder101: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
check-up-kind-1.27-vgpu 8b6b62342b42d2034427dcc4da3ec7b2e4717454 link false /test check-up-kind-1.27-vgpu
Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
kubevirt-commenter-bot commented 4 months ago

/retest-required This bot automatically retries required jobs that failed/flaked on approved PRs. Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

kubevirt-commenter-bot commented 4 months ago

/retest-required This bot automatically retries required jobs that failed/flaked on approved PRs. Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

oshoval commented 4 months ago

seems kubevirtci publish flakes lately https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/logs/publish-kubevirtci/1780863280890253312

this is why i think provision manager thinks vm based providers should run, because it compares to latest release (which is what it should indeed, but if there were releases the diff was smaller)