Closed smira closed 1 year ago
This looks like an old issue. Has this been implemented in feat: support TalosControlPlane rolling upgrade ?
Given I have a cluster with 3 control plane nodes when I update the TalosControlPlane to reference a new infrastructureTemplate (for example with a new image template name) then a new control plane node is created as expected but the node does not join the existing cluster.
The result is 4 running control plane nodes. The rolling update does not seem to work.
This is my configuration using Talos and CAPV (VMware) as infrastructure: cluster yaml to render the standard.yaml into cluster.yaml run the commands:
. ./standard.env
envsubst < ../standard.yaml >cluster.yaml
yes, this had been implemented long time ago. we have a test for the rollout of new cp nodes.
The rollout might stop if the controlplane is not healthy, it's the expected behavior.
If the new node doesn't join, there should be investigated first. Control plane resource status shows detailed information about failed checks, while Talos logs might show why it doesn't join.
One issue currently still existing is that a rollout is not triggered if you edit the talosconfig (.spec.controlPlaneConfig
) in your TalosControlPlane
resource.
It is only triggered if you edit the infrastructureRef (.spec.infrastructureTemplate
) or kubernetes version (spec.version
)
Hm... That deserves a separate issue 😉
The issue was cause by the VMware cloud provider (CPI). It requires the node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
to be set at new nodes. If this is not set then the ProviderId at the node is not set and thereby the nodeRef in Machine is neither set.
When I manually apply the taint to the new nodes then the rolling of control planes works. However, I do not see any configuration options to add custom taints to new nodes in Talos (at least through the Cluster API).
kubelet supports taints via its config: https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
and Talos provides a way to add extra config for the kubelet.
I'm going to close this issue, as it is unrelated and actually fixed in CACPPT, so let's move this to the new issue/Slack/discussion if it needs further investigation, thank you.
Control Plane provider should support rolling out new set of control plane machines on
spec
changes (similar toMachineDeployment
controller and kubeadm control plane provider):Version
(Kubernetes version)InfrastructureTemplate
E.g. see this code from kubeadm provider:
https://github.com/kubernetes-sigs/cluster-api/blob/cefc044b286676bf5f04a9b3e9009eb93c2a5329/controlplane/kubeadm/internal/controllers/upgrade.go#L33-L34