medik8s / self-node-remediation

Automatic repair for unhealthy Kubernetes nodes
https://www.medik8s.io/
Apache License 2.0
45 stars 17 forks source link

[TEST ONLY] log last Peer check time #222

Closed clobrano closed 3 months ago

clobrano commented 4 months ago

!!TEST ONLY!!

do not merge

openshift-ci[bot] commented 4 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/medik8s/self-node-remediation/blob/main/OWNERS)~~ [clobrano] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
clobrano commented 4 months ago

/test 4.15-openshift-e2e

clobrano commented 4 months ago

/test 4.15-openshift-e2e

clobrano commented 4 months ago

@clobrano: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests: Test name Commit Details Required Rerun command ci/prow/4.15-openshift-e2e a5ad462 link true /test 4.15-openshift-e2e

Full PR test history. Your PR dashboard.

Tested only Without API connectivity/Healthy node (no SNR) context with a Ginkgo Focus Context

DS pods do not seem to have checked their peers.

In this test version I added a log with the timestamp of the last performed peer check for each DS.

SNR-PR222 CI logs

The test passed, the returned failure seems to come from some clean up

SNR-PR222 SNR DS logs

API Server connectivity is cut from time 17:45.16 to 17:50.38 (counted in between the two Node's uptime measurements), and all DS show the api check succeeded in the same interval, and the last peer check timestamp was never updated.

clobrano commented 4 months ago

/test 4.14-openshift-e2e

clobrano commented 4 months ago

/tes 4.15-openshift-e2e

clobrano commented 4 months ago

/test 4.15-openshift-e2e

openshift-ci[bot] commented 4 months ago

@clobrano: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.14-openshift-e2e a5ad46260302716c3e37a5fb2b28a6570d832ef7 link true /test 4.14-openshift-e2e
ci/prow/4.15-openshift-e2e 2d823cbe279b6eb09b7082c15bd9e13f1bc6d84d link true /test 4.15-openshift-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
slintes commented 3 months ago

superseeded by and partly included in #226

/close

openshift-ci[bot] commented 3 months ago

@slintes: Closed this PR.

In response to [this](https://github.com/medik8s/self-node-remediation/pull/222#issuecomment-2217638142): >superseeded by and partly included in #226 > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.