kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.5k stars 1.3k forks source link

Add condition that indicates when a node has been drained #3365

Closed ncdc closed 3 years ago

ncdc commented 4 years ago

User Story

As a user/operator I would like to know when the Node for a Machine has finished draining, so I have more insight into the Machine's lifecycle.

Detailed Description

3132 proposes the addition of pre-drain and pre-terminate Machine lifecycle hooks. If you're writing a hook controller, and you want to run your pre-terminate hook after the node has been drained, there is no easy way to know that the draining has completed. You could query the node, but it would be easier if the Machine had a condition such as NodeDrained that the controller could evaluate.

Anything else you would like to add:

Nope 😄

/kind feature

vincepri commented 4 years ago

/milestone v0.3.x

vincepri commented 4 years ago

/help /priority important-longterm

k8s-ci-robot commented 4 years ago

@vincepri: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/3365): >/help >/priority important-longterm Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
michaelgugino commented 4 years ago

My thinking on this was to have this be apart of the kubectl library, add some annotation to the node after draining is complete. Annotation should be opt-out (so always applied), and NLM can remove the annotation whenever the node transitions to uncordoned. I haven't bothered to write this up as a KEP yet, but this seems like another place where we can standardize with the rest of the community.

detiber commented 4 years ago

@michaelgugino I believe the thought behind this issue was to have a Condition on the Machine that indicates that the node drain has been completed, which could then be used by external lifecycle hooks to trigger action after the node is drained, but before the infrastructure is deleted.

That said, if there is a common annotation that will exist on the Node, then the condition could be populated by the presence/absence of the annotation on the related Node.

michaelgugino commented 4 years ago

@detiber yeah, I was just thinking about the context of what the machine-controller is doing. The problem with drain is it's somewhat transient as an admin could uncordon the node and cause funky things to happen. Not saying that's a blocker here, just something to think about (this kinda begets another question: will we attempt to drain after successful drain?)

I guess we can just say we guarantee that we'll drain it completely at least once and that's probably fine.

Another concern is, if we want to add power management to the machines (I think we probably should), we need to be careful about what 'drain' is taking place when we indicate drain has succeeded.

One positive to the machine keeping state is that we're signaling we've run our drain, and not someone else's interpretation of drain. The ExcludeNodeDraining annotation kinda muddies things up. Are we instantly declaring the node to be drained?

Consider:

I use ExcludeNodeDraining to prevent machine-controller from draining. I have another controller that will drain the node after machine is marked deleted. This controller should indicate (on the node? The machine?) that it has finished its custom drain process. Another controller with pre-termination hook wants to begin some action after drain is complete and not before.

I think in the above case, indicating on the machine is easiest. If using ExcludeNodeDraining annotation, machine-controller should not set the status/indication of draining complete, but some other actor may set that annotation if they choose. This is material only if another actor is using the pre-termination hook as otherwise the machine-controller will proceed to immediately terminate the instance.

vincepri commented 4 years ago

/milestone v0.4.0

fabriziopandini commented 3 years ago

@vincepri @ncdc Should https://github.com/kubernetes-sigs/cluster-api/blob/7523f71dfd58a8c4e358b057da611546dc72fdec/api/v1alpha3/condition_consts.go#L84-L101

Be enough to close this issue?

ncdc commented 3 years ago

I think so. If we need to make further adjustments re Michael's comments, we can do a new issue for that.

fabriziopandini commented 3 years ago

/close as per comment above (those conditions were implemented by https://github.com/kubernetes-sigs/cluster-api/pull/3527)

k8s-ci-robot commented 3 years ago

@fabriziopandini: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/3365#issuecomment-702045718): >/close >as per comment above Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.