siderolabs / sidero

Sidero Metal is a bare metal provisioning system with support for Kubernetes Cluster API.
https://www.sidero.dev
Mozilla Public License 2.0
403 stars 63 forks source link

waiting for coredns to report ready: no ready pods found for namespace "kube-system" and label selector "k8s-app=kube-dns" #1171

Closed fdawg4l closed 1 year ago

fdawg4l commented 1 year ago

Hi,

I'm following the Getting Started guide's Prerequisite: Kubernetes section and am trying to get the local docker cluster going and it's timing out. It looks like it's becuase the container is out of space, but it looks like there's plenty of space. Am I missing something?

$ talosctl cluster create \
  --name sidero-demo \
  -p 69:69/udp,8081:8081/tcp,51821:51821/udp \
  --workers 0 \
  --config-patch '[{"op": "add", "path": "/cluster/allowSchedulingOnMasters", "value": true}]' \
  --endpoint 172.16.100.179

validating CIDR and reserving IPs
generating PKI and tokens
creating network sidero-demo
creating controlplane nodes
creating worker nodes
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all k8s nodes to report ready: OK
waiting for all control plane static pods to be running: OK
waiting for all control plane components to be ready: OK
waiting for kube-proxy to report ready: OK
◲ waiting for coredns to report ready: no ready pods found for namespace "kube-system" and label selector "k8s-app=kube-dns"
context deadline exceeded
2023/08/11 03:46:57 limited GOMAXPROCS to 4
2023/08/11 03:46:57 waiting 1 second(s) for USB storage
2023/08/11 03:46:58 initialize sequence: 5 phase(s)
2023/08/11 03:46:58 phase logger (1/5): 1 tasks(s)
2023/08/11 03:46:58 task setupLogger (1/1): starting
[talos] 2023/08/11 03:46:58 task setupLogger (1/1): done, 103.004µs
[talos] 2023/08/11 03:46:58 phase logger (1/5): done, 187.158µs
[talos] 2023/08/11 03:46:58 phase systemRequirements (2/5): 1 tasks(s)
[talos] 2023/08/11 03:46:58 task setupSystemDirectory (1/1): starting
[talos] 2023/08/11 03:46:58 task setupSystemDirectory (1/1): done, 123.678µs
[talos] 2023/08/11 03:46:58 phase systemRequirements (2/5): done, 187.108µs
[talos] 2023/08/11 03:46:58 phase etc (3/5): 2 tasks(s)
[talos] 2023/08/11 03:46:58 task createOSReleaseFile (2/2): starting
[talos] 2023/08/11 03:46:58 task CreateSystemCgroups (1/2): starting
[talos] 2023/08/11 03:46:58 task createOSReleaseFile (2/2): done, 374.37µs
[talos] 2023/08/11 03:46:58 task CreateSystemCgroups (1/2): done, 921.567µs
[talos] 2023/08/11 03:46:58 phase etc (3/5): done, 991.535µs
[talos] 2023/08/11 03:46:58 phase machined (4/5): 1 tasks(s)
[talos] 2023/08/11 03:46:58 task startMachined (1/1): starting
[talos] 2023/08/11 03:46:58 service[machined](Preparing): Running pre state
[talos] 2023/08/11 03:46:58 service[machined](Preparing): Creating service runner
[talos] 2023/08/11 03:46:58 service[machined](Running): Service started as goroutine
[talos] 2023/08/11 03:46:58 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["pool.ntp.org"]}
[talos] 2023/08/11 03:46:58 node identity established {"component": "controller-runtime", "controller": "cluster.NodeIdentityController", "node_id": "a2hKUgAjGEEu75IefgOu36dH2KhGhriY9hsAc0KemKPB"}
[talos] 2023/08/11 03:46:58 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"]}
[talos] 2023/08/11 03:46:58 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["pool.ntp.org"]}
[talos] 2023/08/11 03:46:58 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"]}
[talos] 2023/08/11 03:46:59 service[machined](Running): Health check successful
[talos] 2023/08/11 03:46:59 task startMachined (1/1): done, 1.000546719s
[talos] 2023/08/11 03:46:59 phase machined (4/5): done, 1.000577199s
[talos] 2023/08/11 03:46:59 phase config (5/5): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task loadConfig (1/1): starting
[talos] task loadConfig (1/1): 2023/08/11 03:46:59 downloading config
[talos] 2023/08/11 03:46:59 fetching machine config from: USERDATA environment variable
[talos] task loadConfig (1/1): 2023/08/11 03:46:59 storing config in memory
[talos] 2023/08/11 03:46:59 task loadConfig (1/1): done, 1.68278ms
[talos] 2023/08/11 03:46:59 phase config (5/5): done, 1.726044ms
[talos] 2023/08/11 03:46:59 initialize sequence: done: 1.003722984s
[talos] 2023/08/11 03:46:59 install sequence: 0 phase(s)
[talos] 2023/08/11 03:46:59 install sequence: done: 3.522µs
[talos] 2023/08/11 03:46:59 boot sequence: 13 phase(s)
[talos] 2023/08/11 03:46:59 phase validateConfig (1/13): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task validateConfig (1/1): starting
[talos] 2023/08/11 03:46:59 task validateConfig (1/1): done, 102.814µs
[talos] 2023/08/11 03:46:59 service[apid](Waiting): Waiting for service "containerd" to be "up", api certificates
[talos] 2023/08/11 03:46:59 phase validateConfig (1/13): done, 129.874µs
[talos] 2023/08/11 03:46:59 phase saveConfig (2/13): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task saveConfig (1/1): starting
[talos] 2023/08/11 03:46:59 task saveConfig (1/1): done, 166.585µs
[talos] 2023/08/11 03:46:59 phase saveConfig (2/13): done, 184.699µs
[talos] 2023/08/11 03:46:59 phase memorySizeCheck (3/13): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task memorySizeCheck (1/1): starting
[talos] 2023/08/11 03:46:59 skipping memory size check in the container
[talos] 2023/08/11 03:46:59 task memorySizeCheck (1/1): done, 13.132µs
[talos] 2023/08/11 03:46:59 phase memorySizeCheck (3/13): done, 22.712µs
[talos] 2023/08/11 03:46:59 phase diskSizeCheck (4/13): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task diskSizeCheck (1/1): starting
[talos] 2023/08/11 03:46:59 skipping disk size check in the container
[talos] 2023/08/11 03:46:59 task diskSizeCheck (1/1): done, 9.542µs
[talos] 2023/08/11 03:46:59 phase diskSizeCheck (4/13): done, 17.494µs
[talos] 2023/08/11 03:46:59 phase env (5/13): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task setUserEnvVars (1/1): starting
[talos] 2023/08/11 03:46:59 task setUserEnvVars (1/1): done, 14.591µs
[talos] 2023/08/11 03:46:59 phase env (5/13): done, 24.775µs
[talos] 2023/08/11 03:46:59 phase containerd (6/13): 1 tasks(s)
[talos] 2023/08/11 03:46:59 task startContainerd (1/1): starting
[talos] 2023/08/11 03:46:59 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["8.8.8.8", "1.1.1.1"]}
[talos] 2023/08/11 03:46:59 service[containerd](Preparing): Running pre state
[talos] 2023/08/11 03:46:59 service[containerd](Preparing): Creating service runner
[talos] 2023/08/11 03:46:59 service[containerd](Running): Process Process(["/bin/containerd" "--address" "/system/run/containerd/containerd.sock" "--state" "/system/run/containerd" "--root" "/system/var/lib/containerd"]) started with PID 22
[talos] 2023/08/11 03:46:59 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:46:59 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:46:59 service[kubelet](Waiting): Waiting for service "cri" to be "up", time sync, network
[talos] 2023/08/11 03:47:00 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:00 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:00 service[apid](Waiting): Waiting for service "containerd" to be "up"
[talos] 2023/08/11 03:47:00 service[containerd](Running): Health check successful
[talos] 2023/08/11 03:47:00 task startContainerd (1/1): done, 1.005007972s
[talos] 2023/08/11 03:47:00 phase containerd (6/13): done, 1.00503639s
[talos] 2023/08/11 03:47:00 phase dbus (7/13): 1 tasks(s)
[talos] 2023/08/11 03:47:00 service[apid](Preparing): Running pre state
[talos] 2023/08/11 03:47:00 task startDBus (1/1): starting
[talos] 2023/08/11 03:47:00 service[apid](Preparing): Creating service runner
[talos] 2023/08/11 03:47:00 task startDBus (1/1): done, 2.604218ms
[talos] 2023/08/11 03:47:00 phase dbus (7/13): done, 2.720293ms
[talos] 2023/08/11 03:47:00 phase sharedFilesystems (8/13): 1 tasks(s)
[talos] 2023/08/11 03:47:00 task setupSharedFilesystems (1/1): starting
[talos] 2023/08/11 03:47:00 task setupSharedFilesystems (1/1): done, 132.384µs
[talos] 2023/08/11 03:47:00 phase sharedFilesystems (8/13): done, 214.324µs
[talos] 2023/08/11 03:47:00 phase var (9/13): 1 tasks(s)
[talos] 2023/08/11 03:47:00 task setupVarDirectory (1/1): starting
[talos] 2023/08/11 03:47:00 task setupVarDirectory (1/1): done, 1.338986ms
[talos] 2023/08/11 03:47:00 phase var (9/13): done, 1.434465ms
[talos] 2023/08/11 03:47:00 phase legacyCleanup (10/13): 1 tasks(s)
[talos] 2023/08/11 03:47:00 task cleanupLegacyStaticPodFiles (1/1): starting
[talos] 2023/08/11 03:47:00 task cleanupLegacyStaticPodFiles (1/1): done, 145.67µs
[talos] 2023/08/11 03:47:00 phase legacyCleanup (10/13): done, 217.268µs
[talos] 2023/08/11 03:47:00 phase userSetup (11/13): 1 tasks(s)
[talos] 2023/08/11 03:47:00 task writeUserFiles (1/1): starting
[talos] 2023/08/11 03:47:00 task writeUserFiles (1/1): done, 21.871µs
[talos] 2023/08/11 03:47:00 phase userSetup (11/13): done, 77.931µs
[talos] 2023/08/11 03:47:00 phase startEverything (12/13): 1 tasks(s)
[talos] 2023/08/11 03:47:00 task startAllServices (1/1): starting
[talos] task startAllServices (1/1): 2023/08/11 03:47:00 waiting for 7 services
[talos] 2023/08/11 03:47:00 service[cri](Waiting): Waiting for network
[talos] 2023/08/11 03:47:00 service[cri](Preparing): Running pre state
[talos] 2023/08/11 03:47:00 service[trustd](Waiting): Waiting for service "containerd" to be "up", time sync, network
[talos] 2023/08/11 03:47:00 service[cri](Preparing): Creating service runner
[talos] 2023/08/11 03:47:00 service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network, etcd spec
[talos] task startAllServices (1/1): 2023/08/11 03:47:00 service "apid" to be "up", service "containerd" to be "up", service "cri" to be "up", service "etcd" to be "up", service "kubelet" to be "up", service "machined" to be "up", service "trustd" to be "up"
[talos] 2023/08/11 03:47:00 service[trustd](Preparing): Running pre state
[talos] 2023/08/11 03:47:00 service[trustd](Preparing): Creating service runner
[talos] 2023/08/11 03:47:00 service[kubelet](Waiting): Waiting for service "cri" to be registered
[talos] 2023/08/11 03:47:00 service[cri](Running): Process Process(["/bin/containerd" "--address" "/run/containerd/containerd.sock" "--config" "/etc/cri/containerd.toml"]) started with PID 51
[talos] 2023/08/11 03:47:00 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:01 service[apid](Running): Started task apid (PID 119) for container apid
[talos] 2023/08/11 03:47:01 service[trustd](Running): Started task trustd (PID 118) for container trustd
[talos] 2023/08/11 03:47:01 service[etcd](Waiting): Waiting for service "cri" to be "up"
[talos] 2023/08/11 03:47:01 service[kubelet](Waiting): Waiting for service "cri" to be "up"
[talos] 2023/08/11 03:47:01 service[cri](Running): Health check successful
[talos] 2023/08/11 03:47:01 service[etcd](Preparing): Running pre state
[talos] 2023/08/11 03:47:01 service[kubelet](Preparing): Running pre state
[talos] 2023/08/11 03:47:01 service[trustd](Running): Health check successful
[talos] 2023/08/11 03:47:01 service[apid](Running): Health check successful
[talos] 2023/08/11 03:47:02 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:02 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:03 bootstrap request received
[talos] 2023/08/11 03:47:03 service[etcd](Failed): Failed to run pre stage: failed to pull image "gcr.io/etcd-development/etcd:v3.5.9": 1 error(s) occurred:
    failed to pull image "gcr.io/etcd-development/etcd:v3.5.9": failed to copy: httpReadSeeker: failed open: failed to do request: context canceled
[talos] 2023/08/11 03:47:03 service[etcd](Finished): Bootstrap requested
[talos] 2023/08/11 03:47:03 service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network, etcd spec
[talos] 2023/08/11 03:47:03 service[etcd](Preparing): Running pre state
[talos] 2023/08/11 03:47:04 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:06 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:08 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:08 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:10 service[etcd](Preparing): Creating service runner
[talos] 2023/08/11 03:47:10 service[etcd](Running): Started task etcd (PID 214) for container etcd
[talos] 2023/08/11 03:47:14 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:15 service[etcd](Running): Health check successful
[talos] 2023/08/11 03:47:15 rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-apiserver"}
[talos] 2023/08/11 03:47:15 rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-controller-manager"}
[talos] 2023/08/11 03:47:15 rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-scheduler"}
[talos] 2023/08/11 03:47:15 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:15 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] task startAllServices (1/1): 2023/08/11 03:47:15 service "kubelet" to be "up"
[talos] 2023/08/11 03:47:16 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:17 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:18 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:19 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:21 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:25 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:25 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] task startAllServices (1/1): 2023/08/11 03:47:30 service "kubelet" to be "up"
[talos] 2023/08/11 03:47:32 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:33 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:44 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] task startAllServices (1/1): 2023/08/11 03:47:45 service "kubelet" to be "up"
[talos] 2023/08/11 03:47:46 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:47:47 service[kubelet](Preparing): Creating service runner
[talos] 2023/08/11 03:47:47 service[kubelet](Running): Started task kubelet (PID 270) for container kubelet
[talos] 2023/08/11 03:47:49 service[kubelet](Running): Health check successful
[talos] 2023/08/11 03:47:49 task startAllServices (1/1): done, 48.910272876s
[talos] 2023/08/11 03:47:49 phase startEverything (12/13): done, 48.910312461s
[talos] 2023/08/11 03:47:49 phase labelControlPlane (13/13): 1 tasks(s)
[talos] 2023/08/11 03:47:49 task labelNodeAsControlPlane (1/1): starting
[talos] 2023/08/11 03:47:49 retrying error: Get "https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s": dial tcp [::1]:6443: connect: connection refused
[talos] 2023/08/11 03:47:58 controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"}
[talos] 2023/08/11 03:48:00 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: Get \"https://localhost:6443/api/v1/nodes/sidero-demo-controlplane-1?timeout=30s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:48:03 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-p90dg4: Get \"https://localhost:6443/api?timeout=32s\": dial tcp [::1]:6443: connect: connection refused"}
[talos] 2023/08/11 03:48:09 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2023/08/11 03:48:14 controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"}
[talos] 2023/08/11 03:48:14 retrying error: nodes "sidero-demo-controlplane-1" not found
[talos] 2023/08/11 03:48:27 created /v1/Secret/bootstrap-token-p90dg4 {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:27 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system-bootstrap-approve-node-client-csr {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:27 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system-bootstrap-node-bootstrapper {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:27 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system-bootstrap-node-renewal {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:27 created rbac.authorization.k8s.io/v1/ClusterRole/flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:28 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:28 created /v1/ServiceAccount/flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:28 created /v1/ConfigMap/kube-flannel-cfg {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:29 created apps/v1/DaemonSet/kube-flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:29 created apps/v1/DaemonSet/kube-proxy {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:29 controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"}
[talos] 2023/08/11 03:48:30 created /v1/ServiceAccount/kube-proxy {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:30 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/kube-proxy {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:30 created /v1/ServiceAccount/coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:31 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system:coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:31 created rbac.authorization.k8s.io/v1/ClusterRole/system:coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:32 created /v1/ConfigMap/coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:32 created apps/v1/Deployment/coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:32 created /v1/Service/kube-dns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:33 created /v1/ConfigMap/kubeconfig-in-cluster {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2023/08/11 03:48:34 controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\terror getting node: nodes \"sidero-demo-controlplane-1\" not found"}
[talos] 2023/08/11 03:48:45 controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"}
[talos] 2023/08/11 03:48:59 task labelNodeAsControlPlane (1/1): done, 1m9.737278005s
[talos] 2023/08/11 03:48:59 phase labelControlPlane (13/13): done, 1m9.737318336s
[talos] 2023/08/11 03:48:59 boot sequence: done: 1m59.65793272s

$ kubectl --kubeconfig /tmp/f get no
NAME                         STATUS   ROLES           AGE   VERSION
sidero-demo-controlplane-1   Ready    control-plane   11m   v1.27.4

$ kubectl --kubeconfig /tmp/f describe no
Name:               sidero-demo-controlplane-1
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=sidero-demo-controlplane-1
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"3a:7c:49:ff:d5:90"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.5.0.2
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 10 Aug 2023 20:48:57 -0700
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 10 Aug 2023 20:49:42 -0700   Thu, 10 Aug 2023 20:49:42 -0700   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 10 Aug 2023 20:56:28 -0700   Thu, 10 Aug 2023 20:48:57 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         True    Thu, 10 Aug 2023 20:56:28 -0700   Thu, 10 Aug 2023 20:49:18 -0700   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure          False   Thu, 10 Aug 2023 20:56:28 -0700   Thu, 10 Aug 2023 20:48:57 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 10 Aug 2023 20:56:28 -0700   Thu, 10 Aug 2023 20:49:18 -0700   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.5.0.2
  Hostname:    sidero-demo-controlplane-1
Capacity:
 cpu:                12
 ephemeral-storage:  243218828Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             32472092Ki
 pods:               110
Allocatable:
 cpu:                11950m
 ephemeral-storage:  223882036058
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             32173084Ki
 pods:               110
System Info:
 Machine ID:                 9e54de2b4700b21f588ab84684ad8777
 System UUID:                4c4c4544-0056-4e10-8052-c3c04f385332
 Boot ID:                    7cb665d8-363b-4bdb-8e19-62c374491f02
 Kernel Version:             5.15.0-78-generic
 OS Image:                   Talos (v1.4.7)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  containerd://1.6.21
 Kubelet Version:            v1.27.4
 Kube-Proxy Version:         v1.27.4
PodCIDR:                     10.244.0.0/24
PodCIDRs:                    10.244.0.0/24
Non-terminated Pods:         (5 in total)
  Namespace                  Name                                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                  ------------  ----------  ---------------  -------------  ---
  kube-system                kube-apiserver-sidero-demo-controlplane-1             200m (1%)     0 (0%)      512Mi (1%)       0 (0%)         11m
  kube-system                kube-controller-manager-sidero-demo-controlplane-1    50m (0%)      0 (0%)      256Mi (0%)       0 (0%)         11m
  kube-system                kube-flannel-xk9jz                                    100m (0%)     0 (0%)      50Mi (0%)        0 (0%)         12m
  kube-system                kube-proxy-6rv9s                                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  kube-system                kube-scheduler-sidero-demo-controlplane-1             10m (0%)      0 (0%)      64Mi (0%)        0 (0%)         11m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                360m (3%)   0 (0%)
  memory             882Mi (2%)  0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
Events:
  Type     Reason               Age        From                                 Message
  ----     ------               ----       ----                                 -------
  Normal   Starting             <unknown>                                       
  Warning  FreeDiskSpaceFailed  8m9s       kubelet, sidero-demo-controlplane-1  Failed to garbage collect required amount of images. Attempted to free 36286645862 bytes, but only found 0 bytes eligible to free.
  Warning  FreeDiskSpaceFailed  3m9s       kubelet, sidero-demo-controlplane-1  Failed to garbage collect required amount of images. Attempted to free 36293264998 bytes, but only found 0 bytes eligible to free.
NAMESPACE     NAME                                                 READY   STATUS    RESTARTS      AGE
kube-system   coredns-d779cc7ff-vsv27                              0/1     Pending   0             13m
kube-system   coredns-d779cc7ff-znmsk                              0/1     Pending   0             13m
kube-system   kube-apiserver-sidero-demo-controlplane-1            1/1     Running   0             12m
kube-system   kube-controller-manager-sidero-demo-controlplane-1   1/1     Running   2 (13m ago)   12m
kube-system   kube-flannel-xk9jz                                   1/1     Running   0             13m
kube-system   kube-proxy-6rv9s                                     1/1     Running   0             13m
kube-system   kube-scheduler-sidero-demo-controlplane-1            1/1     Running   2 (14m ago)   12m

$ kubectl --kubeconfig /tmp/f describe po -n kube-system coredns-d779cc7ff-vsv27
Name:                 coredns-d779cc7ff-vsv27
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 <none>
Labels:               k8s-app=kube-dns
                      pod-template-hash=d779cc7ff
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-d779cc7ff
Containers:
  coredns:
    Image:       docker.io/coredns/coredns:1.10.1
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nnjqg (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-nnjqg:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.cloudprovider.kubernetes.io/uninitialized:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute for 300s
                             node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age        From  Message
  ----     ------            ----       ----  -------
  Warning  FailedScheduling  <unknown>        no nodes available to schedule pods
  Warning  FailedScheduling  <unknown>        0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  <unknown>        0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  <unknown>        0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
NODE             FILESYSTEM       SIZE(GB)   USED(GB)   AVAILABLE(GB)   PERCENT USED   MOUNTED ON
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /
172.16.100.179   tmpfs            0.07       0.00       0.07            0.00%          /dev
172.16.100.179   shm              0.07       0.00       0.07            0.00%          /dev/shm
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /opt
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /var
172.16.100.179   tmpfs            16.63      0.00       16.63           0.00%          /tmp
172.16.100.179   tmpfs            16.63      0.00       16.62           0.00%          /run
172.16.100.179   tmpfs            16.63      0.00       16.63           0.00%          /system
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /etc/kubernetes
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /system/state
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /etc/cni
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /usr/etc/udev
172.16.100.179   /dev/nvme0n1p5   249.06     235.54     13.52           94.57%         /usr/libexec/kubernetes
172.16.100.179   tmpfs            16.63      0.00       16.63           0.00%          /etc/cri/conf.d/hosts
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/system/etcd/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/system/kubelet/rootfs
172.16.100.179   shm              0.07       0.00       0.07            0.00%          /run/containerd/io.containerd.grpc.v1.cri/sandboxes/bbf42bc30a8d827b5295a516c4567ef7781ee80456ba86c3f8fa9071f12a73aa/shm
172.16.100.179   shm              0.07       0.00       0.07            0.00%          /run/containerd/io.containerd.grpc.v1.cri/sandboxes/29fdcb72003a20c910d14eed96b14ce326b6a180cf095c3dd4141758ca76e3e2/shm
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/bbf42bc30a8d827b5295a516c4567ef7781ee80456ba86c3f8fa9071f12a73aa/rootfs
172.16.100.179   shm              0.07       0.00       0.07            0.00%          /run/containerd/io.containerd.grpc.v1.cri/sandboxes/2c78d3658d44d2c2990895cd3d5654a6fa27950c5a4604e04cc250bd1caa55ca/shm
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/29fdcb72003a20c910d14eed96b14ce326b6a180cf095c3dd4141758ca76e3e2/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/2c78d3658d44d2c2990895cd3d5654a6fa27950c5a4604e04cc250bd1caa55ca/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/e5c542978b6eaa3ef5a8818e82a5c57c9ef0318122ae3a4f07e95884f70ffacc/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/281712b14c9ad3e4262fee4fbb6dd2551e66e7deda7454f588e6fa75294932b3/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/1f8542dd29c5c4c99b1555ffb693e5f9d0290b7b51d4ef8c185676ca1784ee39/rootfs
172.16.100.179   tmpfs            32.95      0.00       32.95           0.00%          /var/lib/kubelet/pods/5aa23d6d-917d-46d0-ab22-65313586b7ed/volumes/kubernetes.io~projected/kube-api-access-9mdsx
172.16.100.179   tmpfs            32.95      0.00       32.95           0.00%          /var/lib/kubelet/pods/6be0f2e1-7e52-42b5-8c8e-692fce9ea81e/volumes/kubernetes.io~projected/kube-api-access-rn6th
172.16.100.179   shm              0.07       0.00       0.07            0.00%          /run/containerd/io.containerd.grpc.v1.cri/sandboxes/792176b822d21f872e1287053d439fbc7170727902d14cf72e38acef0f1291a2/shm
172.16.100.179   shm              0.07       0.00       0.07            0.00%          /run/containerd/io.containerd.grpc.v1.cri/sandboxes/3b3d71cc82c93cee21ae8638f37a6daba74e2550b5dbc0f1c9d9742b3171ab0b/shm
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/792176b822d21f872e1287053d439fbc7170727902d14cf72e38acef0f1291a2/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/3b3d71cc82c93cee21ae8638f37a6daba74e2550b5dbc0f1c9d9742b3171ab0b/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/b6e30c12262eab37ecc42ff68837da7f178525662713e86e8ae165b73ebcd3fd/rootfs
172.16.100.179   overlay          249.06     235.54     13.52           94.57%         /run/containerd/io.containerd.runtime.v2.task/k8s.io/fcc2f3e6a594becf14386441c78337237c270f715cd5ee604a6417e5524f01b5/rootfs
smira commented 1 year ago

As you can see from the output you posted, your host filesystem doesn't have enough free disk space, which makes kubelet think it doesn't have disk space to launch a pod. When running in a container mode, your host filesystem becomes what kubelet sees as the disk. When running in QEMU/VM mode, Talos will use whatever space is allocated to it.

fdawg4l commented 1 year ago

@smira thanks for taking a look. Right, kubelet claims there’s not enough space. I see 13.5Gb free. How much free space is required to use sidero to bootstrap a cluster?

smira commented 1 year ago

It's not sidero, but rather kubelet default settings:

evictionHard is a map of signal names to quantities that defines hard eviction thresholds. For example: {"memory.available": "300Mi"}. To explicitly disable, pass a 0% or 100% threshold on an arbitrary resource. Default: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "15%"