medik8s / node-healthcheck-operator

K8s Node Health Check Operator
Apache License 2.0
91 stars 18 forks source link

Fix regressions related to "multiple same kind remediation support" #338

Closed slintes closed 2 months ago

slintes commented 2 months ago

Why we need this PR

In https://github.com/medik8s/node-healthcheck-operator/pull/301 we added support for multiple escalating remediations of the same remediation kind.

While investigating a report about false events about skipping control plane remediation I noticed that we have several issues related to that new feature, because we just checked the CR name when comparing with node names, instead of the new node name annotation.

Changes made

The first commit updates the unit tests in order reveal 3 of the (at least) 4 issues:

The next 2 commits fix the issues in the code.

Which issue(s) this PR fixes

The false event was mentioned in: ECOPROJECT-2057

Test plan

As mentioned already, first commit updates tests to reveal the issues

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/medik8s/node-healthcheck-operator/blob/main/OWNERS)~~ [slintes] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
mshitrit commented 2 months ago

/lgtm

slintes commented 2 months ago

known console issue on 4.17 /override ci/prow/4.17-openshift-e2e

4.16 looks like a race between cleanup of NHC test, and running MHC test... 2024-08-06T20:32:23.051827263Z INFO controllers.MachineHealthCheck.resource manager external remediation CR already exists, but it's not owned by us /test 4.16-openshift-e2e

slintes commented 2 months ago

/override ci/prow/4.17-openshift-e2e

openshift-ci[bot] commented 2 months ago

@slintes: Overrode contexts on behalf of slintes: ci/prow/4.17-openshift-e2e

In response to [this](https://github.com/medik8s/node-healthcheck-operator/pull/338#issuecomment-2272807261): >/override ci/prow/4.17-openshift-e2e Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
slintes commented 2 months ago

don't cherry pick before https://github.com/medik8s/node-healthcheck-operator/pull/343 is merged, in order to prevent merge conflicts

slintes commented 2 months ago

/cherry-pick release-0.8

openshift-cherrypick-robot commented 2 months ago

@slintes: new pull request created: #344

In response to [this](https://github.com/medik8s/node-healthcheck-operator/pull/338#issuecomment-2273161864): >/cherry-pick release-0.8 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.