Open aretecarpe opened 1 day ago
I don't quite understand the issue here. Please describe in more details, so that we can understand.
If you're trying to change the kubelet serving address, Talos still expects to be able to talk to the kubelet over localhost.
Sure, the issue is that talos isn't able to send health checks to the system. Everything is working from a cluster POV, i.e. the pods are all running from the k8s perspective but not from the talos perspective.
As seen here:
Output of kubectl get pods -A
:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-c8dcr 1/1 Running 0 10m
kube-system cilium-envoy-kjqqg 1/1 Running 0 10m
kube-system cilium-operator-54ff457fd7-mdpml 0/1 Pending 0 10m
kube-system cilium-operator-54ff457fd7-pm8fs 1/1 Running 0 10m
kube-system coredns-68d75fd545-lq64r 1/1 Running 0 10m
kube-system coredns-68d75fd545-wtpxq 1/1 Running 0 10m
kube-system kube-apiserver-k8s-controlplane-01 1/1 Running 0 9m22s
kube-system kube-controller-manager-k8s-controlplane-01 1/1 Running 2 (11m ago) 9m28s
kube-system kube-scheduler-k8s-controlplane-01 1/1 Running 2 (11m ago) 9m34s
Output of talos get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
dashboard logs:
user: warning: [2024-11-13T15:30:36.450588315Z]: [talos] controller failed {"component": "controller-runtime", "controller":
"k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?
timeout=30s\": dial tcp 127.0.0.1:10250: connect: connection refused"}
machine config for controlplane
version: v1alpha1
debug: false
persist: true
machine:
type: controlplane
token:
ca:
crt:
key:
certSANs: []
kubelet:
image: ghcr.io/siderolabs/kubelet:v1.31.1
extraConfig:
address: 192.168.0.2
extraMounts:
- destination: /var/mnt/storage
type: bind
source: /var/mnt/storage
options:
- bind
- rshared
- rw
defaultRuntimeSeccompProfileEnabled: true
nodeIP:
validSubnets:
- 192.168.0.0/24
disableManifestsDirectory: true
network: {}
disks:
- device: /dev/nvme1n1
partitions:
- mountpoint: /var/mnt/storage
install:
disk: /dev/nvme0n1
image: [REDACTED]
wipe: false
sysctls:
kernel.kexec_load_disabled: "1"
features:
rbac: true
stableHostname: true
apidCheckExtKeyUsage: true
diskQuotaSupport: true
kubePrism:
enabled: true
port: 7445
hostDNS:
enabled: true
forwardKubeDNSToHost: true
nodeLabels:
node.kubernetes.io/exclude-from-external-load-balancers: ""
cluster:
id:
secret:
controlPlane:
endpoint: https://192.168.0.2:6443
clusterName: k8s-cluster
network:
cni:
name: custom
urls:
- [REDACTED]
dnsDomain: cluster.local
podSubnets:
- 10.240.0.0/14
serviceSubnets:
- 10.96.0.0/12
token:
secretboxEncryptionSecret:
ca:
crt:
key:
aggregatorCA:
crt:
key:
serviceAccount:
key:
apiServer:
image: registry.k8s.io/kube-apiserver:v1.31.1
extraArgs:
bind-address: 192.168.0.2
certSANs:
- 192.168.0.2
disablePodSecurityPolicy: true
admissionControl:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
auditPolicy:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
controllerManager:
image: registry.k8s.io/kube-controller-manager:v1.31.1
proxy:
disabled: true
image: registry.k8s.io/kube-proxy:v1.31.1
scheduler:
image: registry.k8s.io/kube-scheduler:v1.31.1
discovery:
enabled: true
registries:
kubernetes:
disabled: false
service:
disabled: true
etcd:
ca:
crt:
key:
advertisedSubnets:
- 192.168.0.2/24
And I know the static pods are running successfully because I have a custom static pod deployed (a simple nginx server). Just from Talos POV its not. The dashboard is also showing N/A for controllermanager, scheduler, and apiserver.
The problem I am having is why is Talos not showing the pod status ?
The problem I am having is why is Talos not showing the pod status ?
Because Talos expects kubelet
to listen on localhost, and you changed it not to listen on localhost.
Is there no way to change this in the machineconfig? As currently I believe kubelet only allows 0.0.0.0 for all routable interfaces or a single address.
I don't think so, I think it goes back to the same question you have previously about kube-apiserver.
If you want to secure access to cluster components on network level, it's way better to limit access to kubelet
not based on the listen address, but only allow access from the controlplane nodes.
Talos documentation has detailed recommended rules.
Understood, was just odd coming from RKE2 and it behaving that way. I'll make a note of that and maybe down the line if time permits I'll take a gander at the source and see if it can be changed at all. Appreciate the help
Bug Report
Changing kubelet IP address from the default
0.0.0.0
to anything else will cause kubelet to still be checked/served on 127.0.0.1. This could be confusion on my end, but I'm unsure why its still being served on that ip addr. Even with the change to 127.0.0.1. I've also checked the csr and it is approved and signed.Description
I would expect that if I add address: 192.168.0.2 inside of extraConfig in kubelet it should change the kubelet IP.
The support logs show that the address has been changed to 192.168.0.2 kubeletspecs.kubernetes.talos.dev.yaml I guess it's just not reflected in the system as READY checks for APISERVER, CONTROLLER-MANAGER, and SCHEDULER are saying N/A in the dashboard as well via the command
talosctl get staticpods
. But it is showing ready and is working in the cluster via the command:kubectl get pods -A
Logs
"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?timeout=30s\": dial tcp 127.0.0.1:10250: connect: connection refused"
Below is the command for:
talosctl containers --kubernetes
below is the output of:
kubectl get pods -A
below is the output (empty) of:
talosctl get staticpodstatus
Environment