siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
5.98k stars 485 forks source link

Kubelet is not running #8706

Open Minivolk02 opened 2 months ago

Minivolk02 commented 2 months ago

Bug Report

10.224.239.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"} user: warning: [2024-05-04T14:41:38.604157697Z]: [talos] adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 3, "offset": "-1.718479ms", "freq_offset": 1807500, "freq_offset_ppm": 27} user: warning: [2024-05-04T14:41:50.832675697Z]: [talos] task startAllServices (1/1): service "etcd" to be "up", service "kubelet" to be "up" user: warning: [2024-05-04T14:42:03.727447697Z]: [talos] kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.224.224.140:6443/api/v1/namespaces/default/endpoints? fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.224.224.140:6443: connect: connection refused"} user: warning: [2024-05-04T14:42:05.833027697Z]: [talos] task startAllServices (1/1): service "etcd" to be "up", service "kubelet" to be "up" user: warning: [2024-05-04T14:42:20.144088697Z]: [talos] servicekubelet: Starting service user: warning: [2024-05-04T14:42:20.144687697Z]: [talos] servicekubelet: Waiting for service "cri" to be "up", time sync, network user: warning: [2024-05-04T14:42:20.145561697Z]: [talos] servicekubelet: Running pre state user: warning: [2024-05-04T14:42:20.146133697Z]: [talos] servicekubelet: Failed to run pre stage: resource KubeletSpecs.kubernetes.talos.dev(k8s/kubelet@undefined) doesn't exist user: warning: [2024-05-04T14:42:20.833002697Z]: [talos] task startAllServices (1/1): service "etcd" to be "up", service "kubelet" to be "up"

Description

Etcd was running, but after adding a new node stopped

Logs

Environment

rothgar commented 2 months ago

Can you let us know how you got to this point? Are you running talos on a machine, in a VM, in docker? What talosctl commands did you run?

kjaleshire commented 6 days ago

I ran into this recently. The issue was a (malformed?) machine patch like so:

    - |-
      machine:
        kubelet:
          extraArgs:
            rotate-server-certificates: "true"
          extraConfig:
            maxPods: 150
          nodeIP:
            validSubnets:
                - 10.44.1.8/28

Removing it and restarting kubelet fixed the issue.

Interestingly, I was able to apply this patch after kubelet service started successfully.