siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.86k stars 549 forks source link

Changing kubelet address has no effect on which ip is being listened on for kubelet #9710

Open aretecarpe opened 1 day ago

aretecarpe commented 1 day ago

Bug Report

Changing kubelet IP address from the default 0.0.0.0 to anything else will cause kubelet to still be checked/served on 127.0.0.1. This could be confusion on my end, but I'm unsure why its still being served on that ip addr. Even with the change to 127.0.0.1. I've also checked the csr and it is approved and signed.

Description

I would expect that if I add address: 192.168.0.2 inside of extraConfig in kubelet it should change the kubelet IP.

The support logs show that the address has been changed to 192.168.0.2 kubeletspecs.kubernetes.talos.dev.yaml I guess it's just not reflected in the system as READY checks for APISERVER, CONTROLLER-MANAGER, and SCHEDULER are saying N/A in the dashboard as well via the command talosctl get staticpods. But it is showing ready and is working in the cluster via the command: kubectl get pods -A

Logs

"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?timeout=30s\": dial tcp 127.0.0.1:10250: connect: connection refused"

Below is the command for: talosctl containers --kubernetes

192.168.0.2   k8s.io      kube-system/cilium-c8dcr                                                                              registry.k8s.io/pause:3.10                                                5054   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/cilium-c8dcr:apply-sysctl-overwrites:a9f99dca34bc                                      sha256:d781bfd0e519b886e895b2253f23aaa958fd0fddb2e6013cabbec79ee3cf775d   0      CONTAINER_EXITED
192.168.0.2   k8s.io      └─ kube-system/cilium-c8dcr:cilium-agent:cd16c6038860                                                 sha256:d781bfd0e519b886e895b2253f23aaa958fd0fddb2e6013cabbec79ee3cf775d   5808   CONTAINER_RUNNING
192.168.0.2   k8s.io      └─ kube-system/cilium-c8dcr:clean-cilium-state:7766b67c56e9                                           sha256:d781bfd0e519b886e895b2253f23aaa958fd0fddb2e6013cabbec79ee3cf775d   0      CONTAINER_EXITED
192.168.0.2   k8s.io      └─ kube-system/cilium-c8dcr:config:8a8ca8f2eb64                                                       sha256:d781bfd0e519b886e895b2253f23aaa958fd0fddb2e6013cabbec79ee3cf775d   0      CONTAINER_EXITED
192.168.0.2   k8s.io      └─ kube-system/cilium-c8dcr:install-cni-binaries:5ab28d7f3d98                                         sha256:d781bfd0e519b886e895b2253f23aaa958fd0fddb2e6013cabbec79ee3cf775d   0      CONTAINER_EXITED
192.168.0.2   k8s.io      └─ kube-system/cilium-c8dcr:mount-bpf-fs:f1c91418e957                                                 sha256:d781bfd0e519b886e895b2253f23aaa958fd0fddb2e6013cabbec79ee3cf775d   0      CONTAINER_EXITED
192.168.0.2   k8s.io      kube-system/cilium-envoy-kjqqg                                                                        registry.k8s.io/pause:3.10                                                5059   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/cilium-envoy-kjqqg:cilium-envoy:fd45eef82fe8                                           sha256:b38a7071cbb74b7dac0cc0d2538c3e57493271b35cd77ff3cc80e301a34ce51a   5917   CONTAINER_RUNNING
192.168.0.2   k8s.io      kube-system/cilium-operator-54ff457fd7-pm8fs                                                          registry.k8s.io/pause:3.10                                                5055   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/cilium-operator-54ff457fd7-pm8fs:cilium-operator:95ee08abec45                          sha256:4fc44954047e80fa5c20901cdce30e504d088a00ec6663af5911d1974692ca18   5915   CONTAINER_RUNNING
192.168.0.2   k8s.io      kube-system/coredns-68d75fd545-lq64r                                                                  registry.k8s.io/pause:3.10                                                7004   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/coredns-68d75fd545-lq64r:coredns:554305dbfb39                                          registry.k8s.io/coredns/coredns:v1.11.3                                   7053   CONTAINER_RUNNING
192.168.0.2   k8s.io      kube-system/coredns-68d75fd545-wtpxq                                                                  registry.k8s.io/pause:3.10                                                7003   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/coredns-68d75fd545-wtpxq:coredns:3870e44bbab0                                          registry.k8s.io/coredns/coredns:v1.11.3                                   7056   CONTAINER_RUNNING
192.168.0.2   k8s.io      kube-system/kube-apiserver-k8s-controlplane-01                                                    registry.k8s.io/pause:3.10                                                4439   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/kube-apiserver-k8s-controlplane-01:kube-apiserver:1e874dd597d4                     registry.k8s.io/kube-apiserver:v1.31.1                                    4545   CONTAINER_RUNNING
192.168.0.2   k8s.io      kube-system/kube-controller-manager-k8s-controlplane-01                                           registry.k8s.io/pause:3.10                                                4447   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/kube-controller-manager-k8s-controlplane-01:kube-controller-manager:6183252946c4   registry.k8s.io/kube-controller-manager:v1.31.1                           0      CONTAINER_EXITED
192.168.0.2   k8s.io      └─ kube-system/kube-controller-manager-k8s-controlplane-01:kube-controller-manager:6d31a816e15a   registry.k8s.io/kube-controller-manager:v1.31.1                           4874   CONTAINER_RUNNING
192.168.0.2   k8s.io      kube-system/kube-scheduler-k8s-controlplane-01                                                    registry.k8s.io/pause:3.10                                                4444   SANDBOX_READY
192.168.0.2   k8s.io      └─ kube-system/kube-scheduler-k8s-controlplane-01:kube-scheduler:8835f4416636                     registry.k8s.io/kube-scheduler:v1.31.1                                    4875   CONTAINER_RUNNING
192.168.0.2   k8s.io      └─ kube-system/kube-scheduler-k8s-controlplane-01:kube-scheduler:cd1831b59581                     registry.k8s.io/kube-scheduler:v1.31.1                                    0      CONTAINER_EXITED
metadata:
    namespace: k8s
    type: KubeletSpecs.kubernetes.talos.dev
    id: kubelet
    version: 1
    owner: k8s.KubeletSpecController
    phase: running
    created: 2024-11-13T04:05:35Z
    updated: 2024-11-13T04:05:35Z
spec:
    image: ghcr.io/siderolabs/kubelet:v1.31.1
    args:
        - --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubeconfig
        - --cert-dir=/var/lib/kubelet/pki
        - --config=/etc/kubernetes/kubelet.yaml
        - --hostname-override=k8s-controlplane-01
        - --kubeconfig=/etc/kubernetes/kubeconfig-kubelet
        - --node-ip=192.168.0.2
    extraMounts:
        - destination: /var/mnt/storage
          type: bind
          source: /var/mnt/storage
          options:
            - bind
            - rshared
            - rw
          uidmappings: []
          gidmappings: []
    expectedNodename: k8s-controlplane-01
    config:
        address: 192.168.0.2
        apiVersion: kubelet.config.k8s.io/v1beta1
        authentication:
            anonymous:
                enabled: false
            webhook:
                cacheTTL: 0s
                enabled: true
            x509:
                clientCAFile: /etc/kubernetes/pki/ca.crt
        authorization:
            mode: Webhook
            webhook:
                cacheAuthorizedTTL: 0s
                cacheUnauthorizedTTL: 0s
        cgroupRoot: /
        clusterDNS:
            - 10.96.0.10
        clusterDomain: cluster.local
        containerRuntimeEndpoint: unix:///run/containerd/containerd.sock
        cpuManagerReconcilePeriod: 0s
        evictionPressureTransitionPeriod: 0s
        failSwapOn: false
        featureGates:
            LocalStorageCapacityIsolationFSQuotaMonitoring: true
        fileCheckFrequency: 0s
        healthzBindAddress: 192.168.0.2
        httpCheckFrequency: 0s
        imageMaximumGCAge: 0s
        imageMinimumGCAge: 0s
        kind: KubeletConfiguration
        kubeletCgroups: /podruntime/kubelet
        logging:
            flushFrequency: 0
            format: json
            options:
                json:
                    infoBufferSize: "0"
                text:
                    infoBufferSize: "0"
            verbosity: 0
        memorySwap: {}
        nodeStatusReportFrequency: 0s
        nodeStatusUpdateFrequency: 0s
        oomScoreAdj: -450
        port: 10250
        protectKernelDefaults: true
        resolvConf: /system/resolved/resolv.conf
        rotateCertificates: true
        runtimeRequestTimeout: 0s
        seccompDefault: true
        serializeImagePulls: false
        shutdownGracePeriod: 30s
        shutdownGracePeriodCriticalPods: 10s
        staticPodURL: http://127.0.0.1:42563
        streamingConnectionIdleTimeout: 5m0s
        syncFrequency: 0s
        systemCgroups: /system
        systemReserved:
            cpu: 50m
            ephemeral-storage: 256Mi
            memory: 512Mi
            pid: "100"
        tlsMinVersion: VersionTLS13
        volumeStatsAggPeriod: 0s

below is the output of: kubectl get pods -A

NAMESPACE     NAME                                              READY   STATUS    RESTARTS      AGE
kube-system   cilium-c8dcr                                      1/1     Running   0             10m
kube-system   cilium-envoy-kjqqg                                1/1     Running   0             10m
kube-system   cilium-operator-54ff457fd7-mdpml                  0/1     Pending   0             10m
kube-system   cilium-operator-54ff457fd7-pm8fs                  1/1     Running   0             10m
kube-system   coredns-68d75fd545-lq64r                          1/1     Running   0             10m
kube-system   coredns-68d75fd545-wtpxq                          1/1     Running   0             10m
kube-system   kube-apiserver-k8s-controlplane-01            1/1     Running   0             9m22s
kube-system   kube-controller-manager-k8s-controlplane-01   1/1     Running   2 (11m ago)   9m28s
kube-system   kube-scheduler-k8s-controlplane-01            1/1     Running   2 (11m ago)   9m34s

below is the output (empty) of: talosctl get staticpodstatus

Environment

smira commented 1 day ago

I don't quite understand the issue here. Please describe in more details, so that we can understand.

If you're trying to change the kubelet serving address, Talos still expects to be able to talk to the kubelet over localhost.

aretecarpe commented 1 day ago

Sure, the issue is that talos isn't able to send health checks to the system. Everything is working from a cluster POV, i.e. the pods are all running from the k8s perspective but not from the talos perspective.

As seen here: Output of kubectl get pods -A:

NAMESPACE     NAME                                              READY   STATUS    RESTARTS      AGE
kube-system   cilium-c8dcr                                      1/1     Running   0             10m
kube-system   cilium-envoy-kjqqg                                1/1     Running   0             10m
kube-system   cilium-operator-54ff457fd7-mdpml                  0/1     Pending   0             10m
kube-system   cilium-operator-54ff457fd7-pm8fs                  1/1     Running   0             10m
kube-system   coredns-68d75fd545-lq64r                          1/1     Running   0             10m
kube-system   coredns-68d75fd545-wtpxq                          1/1     Running   0             10m
kube-system   kube-apiserver-k8s-controlplane-01            1/1     Running   0             9m22s
kube-system   kube-controller-manager-k8s-controlplane-01   1/1     Running   2 (11m ago)   9m28s
kube-system   kube-scheduler-k8s-controlplane-01            1/1     Running   2 (11m ago)   9m34s

Output of talos get staticpodstatus

NODE   NAMESPACE   TYPE   ID   VERSION   READY

dashboard logs:

 user: warning: [2024-11-13T15:30:36.450588315Z]: [talos] controller failed {"component": "controller-runtime", "controller":                      
 "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?          
 timeout=30s\": dial tcp 127.0.0.1:10250: connect: connection refused"} 

machine config for controlplane

version: v1alpha1
debug: false
persist: true
machine:
    type: controlplane
    token:
    ca:
        crt:
        key:
    certSANs: []
    kubelet:
        image: ghcr.io/siderolabs/kubelet:v1.31.1
        extraConfig:
            address: 192.168.0.2
        extraMounts:
            - destination: /var/mnt/storage
              type: bind
              source: /var/mnt/storage
              options:
                - bind
                - rshared
                - rw
        defaultRuntimeSeccompProfileEnabled: true
        nodeIP:
            validSubnets:
                - 192.168.0.0/24
        disableManifestsDirectory: true
    network: {}
    disks:
        - device: /dev/nvme1n1
          partitions:
            - mountpoint: /var/mnt/storage
    install:
        disk: /dev/nvme0n1
        image: [REDACTED]
        wipe: false
    sysctls:
        kernel.kexec_load_disabled: "1"
    features:
        rbac: true
        stableHostname: true
        apidCheckExtKeyUsage: true
        diskQuotaSupport: true
        kubePrism:
            enabled: true
            port: 7445
        hostDNS:
            enabled: true
            forwardKubeDNSToHost: true
    nodeLabels:
        node.kubernetes.io/exclude-from-external-load-balancers: ""
cluster:
    id:
    secret:
    controlPlane:
        endpoint: https://192.168.0.2:6443
    clusterName: k8s-cluster
    network:
        cni:
            name: custom
            urls:
                - [REDACTED]
        dnsDomain: cluster.local
        podSubnets:
            - 10.240.0.0/14
        serviceSubnets:
            - 10.96.0.0/12
    token:
    secretboxEncryptionSecret:
    ca:
        crt:
        key:
    aggregatorCA:
        crt:
        key:
    serviceAccount:
        key:
    apiServer:
        image: registry.k8s.io/kube-apiserver:v1.31.1
        extraArgs:
            bind-address: 192.168.0.2
        certSANs:
            - 192.168.0.2
        disablePodSecurityPolicy: true
        admissionControl:
            - name: PodSecurity
              configuration:
                apiVersion: pod-security.admission.config.k8s.io/v1alpha1
                defaults:
                    audit: restricted
                    audit-version: latest
                    enforce: baseline
                    enforce-version: latest
                    warn: restricted
                    warn-version: latest
                exemptions:
                    namespaces:
                        - kube-system
                    runtimeClasses: []
                    usernames: []
                kind: PodSecurityConfiguration
        auditPolicy:
            apiVersion: audit.k8s.io/v1
            kind: Policy
            rules:
                - level: Metadata
    controllerManager:
        image: registry.k8s.io/kube-controller-manager:v1.31.1
    proxy:
        disabled: true
        image: registry.k8s.io/kube-proxy:v1.31.1
    scheduler:
        image: registry.k8s.io/kube-scheduler:v1.31.1
    discovery:
        enabled: true
        registries:
            kubernetes:
                disabled: false
            service:
                disabled: true
    etcd:
        ca:
            crt:
            key:
        advertisedSubnets:
            - 192.168.0.2/24

And I know the static pods are running successfully because I have a custom static pod deployed (a simple nginx server). Just from Talos POV its not. The dashboard is also showing N/A for controllermanager, scheduler, and apiserver.

The problem I am having is why is Talos not showing the pod status ?

smira commented 1 day ago

The problem I am having is why is Talos not showing the pod status ?

Because Talos expects kubelet to listen on localhost, and you changed it not to listen on localhost.

aretecarpe commented 1 day ago

Is there no way to change this in the machineconfig? As currently I believe kubelet only allows 0.0.0.0 for all routable interfaces or a single address.

smira commented 1 day ago

I don't think so, I think it goes back to the same question you have previously about kube-apiserver.

If you want to secure access to cluster components on network level, it's way better to limit access to kubelet not based on the listen address, but only allow access from the controlplane nodes.

Talos documentation has detailed recommended rules.

aretecarpe commented 1 day ago

Understood, was just odd coming from RKE2 and it behaving that way. I'll make a note of that and maybe down the line if time permits I'll take a gander at the source and see if it can be changed at all. Appreciate the help