medik8s / node-maintenance-operator

Kubernetes Operator to manage node maintenance through NodeMaintenance custom resources
https://www.medik8s.io/maintenance-node/
Apache License 2.0
25 stars 13 forks source link

Fix Maintenance Creation Check for Control Plane Nodes #110

Closed razo7 closed 6 months ago

razo7 commented 7 months ago

Fix etcd quorum check from looking only at DisruptionsAllowed to also looking for control plane node etcd guard pod. If there are no allowed disruptions and nm CR is for a node that is not disrupted, then we must not allow this CR creation as it would violate etcd quorum. Otherwise, when there is a failed guard pod (Ready status is False) or there is no guard pod for the node, then we allow the CR creation as it won't violate further the etcd quorum.

Furthermore, this etcd quorum check is only valid on OCP / OKD, since they have etcd quorum PDB. Thus, we won't run this validation on other platforms.

Originally the PR intended to block CR creation for any node, including workers that we currently support. It was decided to allow it for any node as long as the (control plane) node won't violate etcd quorum.

ECOPROJECT-1811

openshift-ci[bot] commented 7 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] commented 7 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/medik8s/node-maintenance-operator/blob/main/OWNERS)~~ [razo7] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
razo7 commented 7 months ago

/test 4.14-openshift-e2e /test 4.15-openshift-e2e

razo7 commented 7 months ago

/test 4.14-openshift-e2e /test 4.15-openshift-e2e

razo7 commented 6 months ago

/test 4.14-openshift-e2e /test 4.15-openshift-e2e

clobrano commented 6 months ago

/lgtm giving others a chance to review as well, feel free to unhold /hold

razo7 commented 6 months ago

/retest

razo7 commented 6 months ago

/test 4.14-openshift-e2e /test 4.15-openshift-e2e

razo7 commented 6 months ago

/retest

razo7 commented 6 months ago

/retest

razo7 commented 6 months ago

/test 4.14-openshift-e2e

razo7 commented 6 months ago

Moving from blocking CR creation on unhealthy nodes to better checking of unhealthy nodes, and CP guard pods prior to CR creation and any etcd quorum violation https://github.com/medik8s/common/pull/17

razo7 commented 6 months ago

/retest

razo7 commented 6 months ago

/retest

razo7 commented 6 months ago

/test 4.13-openshift-e2e

slintes commented 6 months ago

/lgtm

razo7 commented 6 months ago

/unhold

razo7 commented 6 months ago

/retest