kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.51k stars 1.3k forks source link

uninitialized taint should always be dropped #8258

Open ykakarap opened 1 year ago

ykakarap commented 1 year ago

Detailed Description

https://github.com/kubernetes-sigs/cluster-api/pull/7993 introduced the node.cluster.x-k8s.io/uninitialized:NoSchedule taint. This taint is applied by default to the nodes when CAPBK is used as the bootstrap provider. CAPI drops this taint from the nodes after the nodes are initialized (labels are synced).

This issue is to audit and ensure that the node is dropped by CAPI when using any of the Machine/MachinePool solutions.

[A clear and concise description of what you want to happen.]

Anything else you would like to add:

More context on the taint: The taint was introduced to solve the delay problem when syncing label to nodes to avoid unnecessarily scheduling workloads on wrong nodes. Part of proposal: Label Sync Between Machines and underlying Kubernetes Nodes

/kind feature

sbueringer commented 1 year ago

@ykakarap Sorry I think my comment on the PR was a bit misleading. I think we're probably fine with the taint (as you already implemented that in #7993), although it doesn't hurt to double check.

I was referring to in-place propagation in general. I wonder if we want to have in-place propagation for MachinePool as well, especially once the MachinePool Machines are introduced, but maybe even for the objects "below" MachinePool we have today.

Or maybe the concept doesn't make sense for MachinePools not sure.

sbueringer commented 1 year ago

^^ @CecileRobertMichon

fabriziopandini commented 1 year ago

/triage accepted @sbueringer please check we should rename this issue to make it more clear what we want to achieve

fabriziopandini commented 1 year ago

/help

k8s-ci-robot commented 1 year ago

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/8258): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sbueringer commented 1 year ago

I would like to get feedback to my comment from @ykakarap before retitling/rewriting the issue

ykakarap commented 1 year ago

I am +1 to exploring in-place propagation support in MachinePools and MachinePool Machines (when it is introduced).

I am not aware of the full details of how MachinePool Machines is going to work but I would like to capture that once MachinePool Machines are introduces we ensure that they too drop the uninitialized taint. As long as we capture that after rewriting I am +1 to renaming/rewriting.

CecileRobertMichon commented 1 year ago

@ykakarap PR for MachinePool Machines is ready for review: #7938 and proposal with more detail can be found here: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220209-machinepool-machines.md

sbueringer commented 1 year ago

I am +1 to exploring in-place propagation support in MachinePools and MachinePool Machines (when it is introduced).

I am not aware of the full details of how MachinePool Machines is going to work but I would like to capture that once MachinePool Machines are introduces we ensure that they too drop the uninitialized taint. As long as we capture that after rewriting I am +1 to renaming/rewriting.

Ah got your point that we should keep dropping the taint even with MachinePool Machines, I just assumed it's good enough that we already do it. But you have a good point there.

@ykakarap Then I think it's better to keep this issue restricted to the taint and create a separate issue to think about in-place propagation for MachinePools in general? (independent of the taint topic) I think for CAPI users it would be great if in-place propagation is done in a similar way in MachineDeployments and MachinePools (if possible). Or maybe that is already how MachinePools work today? (I just don't know)

ykakarap commented 1 year ago

@sbueringer A separate issue sounds good. I agree that it would be great if we can (if possible, depending on feasibility) provide consistent in-place propagation behavior for the user.

Jont828 commented 1 year ago

@ykakarap So MachinePool Machines are really just using the Machine resource with an empty bootstrap ref. As a result, we skip the bootstrapping stage since it's handled by the MachinePool. Do you think this issue would apply to MachinePool Machines or would it already be taken care of?

sbueringer commented 1 year ago

@Jont828 What do you mean with skip the bootstrapping stage?

When e.g. the AzureMachinePool creates new instances/servers in Azure doesn't it use the KubeadmConfig (and the bootstrap data we generate based on that) to bootstrap the new instances/servers?

(I'm specifically talking about how the bootstrapping happens in reality not what we mirro back in the Machine objects)

As long as it uses the KubeadmConfig and the kubeadm bootstrap provider (CABPKG) to bootstrap nodes the nodes should end up with the "node.cluster.x-k8s.io/uninitialized" taint. In https://github.com/kubernetes-sigs/cluster-api/pull/7993 Yuvaraj already implemented it that the CAPI core MachinePool controller is removing the taint.

But this is not a complete solution. For MachineDeployments the CAPI core Machine controller is syncing labels from Machines to nodes (and after that it drops the unitialized taint). So I think as soon as we have Machines for MachinePools the Machine controller will also start syncing labels to nodes (and afterwards drop the taint from the Node).

So if I get it correctly in the PR where we introduce MachinePool Machines we should get the label sync from Machines to Nodes for free (except if we disable it). But I think we should change the MachinePool controller to stop dropping the unitialized taint. Not sure what is happening to the annotations the MachinePool controller is adding as the Machine controller is adding them as well.

This might all be already implemented / resolved in the MachinePool Machines PR, but I'll take a closer look when I review the PR and bring it up there again.

fabriziopandini commented 5 months ago

/priority important-longterm