medik8s / node-maintenance-operator

Kubernetes Operator to manage node maintenance through NodeMaintenance custom resources
https://www.medik8s.io/maintenance-node/
Apache License 2.0
27 stars 13 forks source link

Tolerate control-plane node taint #52

Closed malt3 closed 2 years ago

malt3 commented 2 years ago

Follow KEP 2067: https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint

Closes #49

openshift-ci[bot] commented 2 years ago

Hi @malt3. Thanks for your PR.

I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
razo7 commented 2 years ago

/ok-to-test

mshitrit commented 2 years ago

Since we've decoupled from OCP versions there is a chance that our next release will still be at OCP 4.11, which if IIUC is based on k8s which still contains the old master annotation. We might want to either support both annotations or hold off with this change for a bit.

malt3 commented 2 years ago

We might want to either support both annotations or hold off with this change for a bit.

Do you know any way to configure nodeAffinity to accept either node-role.kubernetes.io/master or node-role.kubernetes.io/control-plane? I looked around but everything I can come up with would require both labels to be present.

I can remove the nodeAffinity parts of this PR for now. Then the operator can support anything up to and including kubernetes 1.24. EDIT: the node-role.kubernetes.io/master label is already dropped in 1.24 so this will not really help.

mshitrit commented 2 years ago

Do you know any way to configure nodeAffinity to accept either node-role.kubernetes.io/master or node-role.kubernetes.io/control-plane?

No, I had a similar problem in another operator. Might not be the best solution but what I've done is applicably detect the correct annotation on startup and use it :man_shrugging:

malt3 commented 2 years ago

Is there a specific requirement preventing the NMO from just running on any node? From my limited understanding, the nodeAffinity may not be needed.

razo7 commented 2 years ago

In addition to @mshitrit comment, it would be better to add the new taint and label rather than rename/switch to the name for better compatibility between versions and upgrade versions.

razo7 commented 2 years ago

Do you know any way to configure nodeAffinity to accept either node-role.kubernetes.io/master or node-role.kubernetes.io/control-plane? I looked around but everything I can come up with would require both labels to be present.

According to node-affinity in Kuberenets - If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified nodeSelectorTerms can be satisfied.. Therefore, it is possible.

How about as follows? @malt3

    spec:
      affinity:
        nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: node-role.kubernetes.io/master
                  operator: Exists
              - matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
malt3 commented 2 years ago

How about as follows? @malt3

That is what I was looking for. Thanks! Will update this PR.

razo7 commented 2 years ago

/retest

razo7 commented 2 years ago

/retest

slintes commented 2 years ago

e2e test is currently broken, I'm trying to reach out to folks who can fix this in the CI cluster

razo7 commented 2 years ago

/retest

slintes commented 2 years ago

the actual test succeeeded, just something in the cleanup failed

Ran 16 of 17 Specs in 118.417 seconds
SUCCESS! -- 16 Passed | 0 Failed | 0 Pending | 1 Skipped
PASS

/override ci/prow/openshift-e2e

thanks a lot for this important update @malt3!

/lgtm /approve

openshift-ci[bot] commented 2 years ago

@slintes: Overrode contexts on behalf of slintes: ci/prow/openshift-e2e

In response to [this](https://github.com/medik8s/node-maintenance-operator/pull/52#issuecomment-1240323898): >the actual test succeeeded, just something in the cleanup failed > >``` >Ran 16 of 17 Specs in 118.417 seconds >SUCCESS! -- 16 Passed | 0 Failed | 0 Pending | 1 Skipped >PASS >``` > >/override ci/prow/openshift-e2e > >thanks a lot for this important update @malt3! > >/lgtm >/approve Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 2 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: malt3, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/medik8s/node-maintenance-operator/blob/main/OWNERS)~~ [slintes] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
razo7 commented 2 years ago

/cherrypick release-0.13

razo7 commented 2 years ago

/cherry-pick release-0.13

openshift-cherrypick-robot commented 2 years ago

@razo7: new pull request created: #55

In response to [this](https://github.com/medik8s/node-maintenance-operator/pull/52#issuecomment-1240579957): >/cherry-pick release-0.13 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
slintes commented 2 years ago

now I'm curious... just for test:

/cherrypick release-0.12

edit: interesting, doesn't work anymore without a dash 🤷🏼‍♂️