siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.59k stars 524 forks source link

Worker node role can't be set #6750

Open vhurtevent opened 1 year ago

vhurtevent commented 1 year ago

Bug Report

When creating a cluster, I want that the worker nodes have explicit role as displayed in a

kubectl describe node command

I tried to set worker role by setting node labels in the machine config spec :

machine:
  nodeLabels:
    node-role.kubernetes.io/worker: "true"

When asking for NodeLabel with talosctl, the label exists :

But the label aren't set on nodes and their role is still <none>.

Logs

In logs, we can see this error :

[ 83.519643] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\tnodes \"dbaas1-worker-0\" is forbidden: is not allowed to modify labels: node-role.kubernetes.io/worker"} Looks like a protected domain label, but how can we set role through Talos node provisionning ?

Environment

andrewrynhard commented 1 year ago

This label is not allowed to be set by the kubelet. Similarly it is unsafe for Talos to do the same. Allowing for this allows a worker node to promote itself amd potentially gain access to privileges it shouldn't have.

vhurtevent commented 1 year ago

Hello @andrewrynhard,

Thank you for your answer, I understand the security problem.

In my use case I would like to distinguish worker nodes which are only workload executors and edge nodes which I dedicate to Ingress controllers executors and are the only backends members of my L4 loadbalancers.

Do you suggest me to drop the use of node-role.kubernetes.io/<any role> and to use a complete custom domain label and value which could be set by Talos through machine.nodeLabels specs ?

Thank you

smira commented 1 year ago

You can set this label outside of Talos, as the last provisioning step, or make the node label itself as something like "my.dev/role", and have something with appropriate permissions to add a matching node-role label. But a worker node by Kubernetes design can't put a role label on itself. So there should be something else running, in the cluster, or outside of the cluster which does that.

sergelogvinov commented 1 year ago

Can we add the node-label validation for it?

as I know this labels can be set by kubelet

node-role.kubernetes.io
kubernetes.io/role
nogweii commented 4 months ago

Adding validation to catch this configuration error would be very much appreciated, as I didn't realize this.

Adding special handling would also be very nice, but I think that would have to be some special handling of talosctl parsing a machine's configuration, rather than Talos itself doing that.

sergelogvinov commented 4 months ago

Hi, @nogweii try to use TalosCCM https://github.com/siderolabs/talos-cloud-controller-manager/blob/main/docs/config.md edge version

nogweii commented 4 months ago

Interesting! @sergelogvinov , not to go too off-topic, does talos-ccm work in a bare-metal cluster, running in a homelab? (I'm running a Talos cluster on a Turing Pi 2 with RK1 compute modules.)

sergelogvinov commented 4 months ago

Talos CCM works inside talos cluster ) It does not matter whether Talos is in a cloud or on bare metal.

mydoomfr commented 3 months ago

I'm unable to set any nodeLabels on the bootstrap of worker nodes

I'm using Talm to set up the worker node, but I don't think it is an issue on Talm's side because I can see the nodeLabels values in the machineConfiguration through talosctl command.

Reproduce the issue

1. Reset the worker node, then apply the configuration

machine:
  nodeLabels:
    node.cloudprovider.kubernetes.io/platform: proxmox
    topology.kubernetes.io/region: Region-1
    topology.kubernetes.io/zone: pve03
    # truncated
talm apply -f nodes/worker-01.yaml -i

2. Wait for the worker node to join the cluster and describe the node labels

kubectl describe node worker-01
Name:               worker-01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker-01
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 20 Jun 2024 22:52:21 +0200
Taints:             <none>
Unschedulable:      false

3. Ensure nodeLabels is correctly setup in machineConfiguration

talosctl get mc --nodes 192.168.100.21 -e 192.168.100.21 --talosconfig=./talosconfig -oyaml |yq -r '.spec.machine.nodeLabels'
node.cloudprovider.kubernetes.io/platform: proxmox
topology.kubernetes.io/region: Region-1
topology.kubernetes.io/zone: pve03

Workaround: Set the labels via kubectl after the nodes join the cluster

kubectl label node worker-01 node.cloudprovider.kubernetes.io/platform=proxmox
kubectl label node worker-01 topology.kubernetes.io/region=Region-1
kubectl label node worker-01 topology.kubernetes.io/zone=pve03

I can open a new issue if needed.

smira commented 3 months ago

Please see NodeRestriction documentation - this is by default enabled on Kubernetes side, and there's nothing we can do on Talos side to workaround it.

If you use labels which are not restricted, Kubernetes API server would allow them to be set. But in this case Talos Linux has same level of access as the kubelet running on the node.

There might be some better way to do config validation/documentation, but there is no "fix" whatsoever, except for changing the admission controller rules.