medik8s / self-node-remediation

Automatic repair for unhealthy Kubernetes nodes
https://www.medik8s.io/
Apache License 2.0
45 stars 17 forks source link

Configurable minimum worker nodecount #238

Open novasbc opened 3 weeks ago

novasbc commented 3 weeks ago

Why we need this PR

Existing code requires there to be at least one other peer worker node before remediation can occur, precluding SNR from remediating on a configuration with 3 control plane nodes + 1 worker node, which is a scenario that we support for bare minimum deployments.

Changes made

Which issue(s) this PR fixes

Fixes #213

Test plan

novasbc commented 3 weeks ago

/test 4.15-openshift-e2e

openshift-ci[bot] commented 3 weeks ago

Hi @novasbc. Thanks for your PR.

I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 3 weeks ago

@novasbc: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to [this](https://github.com/medik8s/self-node-remediation/pull/238#issuecomment-2389243299): >/test 4.15-openshift-e2e Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
slintes commented 3 weeks ago

Hi, do you mind extending the description please? What's the issue, how do you fix it, how do you test the changes... Also, please check the failed test. Thanks

novasbc commented 1 week ago

Hi, do you mind extending the description please? What's the issue, how do you fix it, how do you test the changes... Also, please check the failed test. Thanks

@slintes I updated the description, included the issue # as well.

Also, fixed the build which was failing with 'make verify-bundle', because the bundle hadn't been updated.

slintes commented 1 week ago

Thanks!

/test 4.16-openshift-e2e

novasbc commented 1 week ago

fixed an issue which was causing a failure with make test, regarding rebooter being nil

novasbc commented 1 week ago

/test 4.15-openshift-e2e

novasbc commented 1 week ago

/test 4.16-openshift-e2e

novasbc commented 1 week ago

/test 4.15-openshift-e2e

novasbc commented 1 week ago

/test 4.13-openshift-e2e

novasbc commented 6 days ago

/test 4.13-openshift-e2e

novasbc commented 6 days ago

@razo7 @mshitrit

I looked into the e2e failures reported over the past few days and realized that it was due to temporary/environmental issues. When I re-ran they started passing better. We can't run the tests in an openshift environment, so weren't seeing the same things locally.

Anyhow, I believe this is ready for review.

Thanks!

openshift-ci[bot] commented 5 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: novasbc Once this PR has been reviewed and has the lgtm label, please ask for approval from clobrano. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/medik8s/self-node-remediation/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment