Closed jlgeering closed 8 months ago
Could you try k8s 1.27.2, i think 1.27.1 had a bug with static pods not starting
The error you posted is not critical for Talos, and it's not blocking anything for real. But what it means is that kubelet has no serving certificate. Not clear from the patch, but feels like you have server certificate rotation enabled, and nothing approves the CSR. Nothing on Talos side, but the problem is in the kubelet. Also what Noel said makes sense.
a bit more context: I am following the "Deploying Metrics Server" guide (https://www.talos.dev/v1.4/kubernetes-guides/configuration/deploy-metrics-server/) and installed the Kubelet Serving Certificate Approver as a deployment (not a static manifest).
This does work for worker nodes, but it seems like CSR of the new or upgraded controlplane node is not "sent to the cluster" (i.e. kubectl get csr returns nothing when upgrading a controlplane node)
I upgraded to k8s 1.27.2 and downgraded a worker node and the controlplane node to 1.4.4 => worker node is ready again (at 1.4.4), controlplane node is stuck with:
10.10.99.22: user: warning: [2023-06-05T07:51:34.373958662Z]: [talos] [initramfs] booting Talos v1.4.4
10.10.99.22: user: warning: [2023-06-05T07:51:34.433342662Z]: [talos] [initramfs] mounting the rootfs
10.10.99.22: kern: info: [2023-06-05T07:51:34.491812662Z]: loop0: detected capacity change from 0 to 100664
10.10.99.22: user: warning: [2023-06-05T07:51:34.575569662Z]: [talos] [initramfs] bind mounting /lib/firmware
10.10.99.22: user: warning: [2023-06-05T07:51:34.643357662Z]: [talos] [initramfs] entering the rootfs
10.10.99.22: user: warning: [2023-06-05T07:51:34.701788662Z]: [talos] [initramfs] moving mounts to the new rootfs
10.10.99.22: user: warning: [2023-06-05T07:51:34.773117662Z]: [talos] [initramfs] changing working directory into /root
10.10.99.22: user: warning: [2023-06-05T07:51:34.850207662Z]: [talos] [initramfs] moving /root to /
10.10.99.22: user: warning: [2023-06-05T07:51:34.906563662Z]: [talos] [initramfs] changing root directory
10.10.99.22: user: warning: [2023-06-05T07:51:34.969158662Z]: [talos] [initramfs] cleaning up initramfs
10.10.99.22: user: warning: [2023-06-05T07:51:35.029866662Z]: [talos] [initramfs] executing /sbin/init
10.10.99.22: user: warning: [2023-06-05T07:51:37.952310662Z]: [talos] task setupLogger (1/1): done, 144.909\xc2\xb5s
10.10.99.22: user: warning: [2023-06-05T07:51:38.020135662Z]: [talos] phase logger (1/11): done, 68.01208ms
10.10.99.22: user: warning: [2023-06-05T07:51:38.084792662Z]: [talos] phase systemRequirements (2/11): 6 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:38.155642662Z]: [talos] task setRLimit (6/6): starting
10.10.99.22: user: warning: [2023-06-05T07:51:38.213027662Z]: [talos] task setRLimit (6/6): done, 57.396587ms
10.10.99.22: user: warning: [2023-06-05T07:51:38.279680662Z]: [talos] task mountBPFFS (3/6): starting
10.10.99.22: user: warning: [2023-06-05T07:51:38.338024662Z]: [talos] task setupSystemDirectory (2/6): starting
10.10.99.22: user: warning: [2023-06-05T07:51:38.406768662Z]: [talos] task mountCgroups (4/6): starting
10.10.99.22: user: warning: [2023-06-05T07:51:38.467192662Z]: [talos] task mountPseudoFilesystems (5/6): starting
10.10.99.22: user: warning: [2023-06-05T07:51:38.538014662Z]: [talos] task enforceKSPPRequirements (1/6): starting
10.10.99.22: user: warning: [2023-06-05T07:51:38.609963662Z]: [talos] task setupSystemDirectory (2/6): done, 454.307937ms
10.10.99.22: user: warning: [2023-06-05T07:51:38.689146662Z]: [talos] task mountCgroups (4/6): done, 454.343064ms
10.10.99.22: user: warning: [2023-06-05T07:51:38.760082662Z]: [talos] task mountPseudoFilesystems (5/6): done, 454.392354ms
10.10.99.22: user: warning: [2023-06-05T07:51:38.841411662Z]: [talos] task mountBPFFS (3/6): done, 657.107017ms
10.10.99.22: user: warning: [2023-06-05T07:51:38.932070662Z]: [talos] static pod list url is not available yet; not creating kubelet config {"component": "controller-runtime", "controller": "k8s.KubeletConfigController", "error": "resource StaticPodServerStatuses.kubernetes.talos.dev(k8s/static-pod-server-status@undefined) doesn't exist"}
10.10.99.22: user: warning: [2023-06-05T07:51:39.242207662Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"]}
10.10.99.22: user: warning: [2023-06-05T07:51:39.412166662Z]: [talos] setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["pool.ntp.org"]}
10.10.99.22: user: warning: [2023-06-05T07:51:39.581733662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on [::1]:53: dial udp [::1]:53: connect: cannot assign requested address"}
10.10.99.22: user: warning: [2023-06-05T07:51:39.836469662Z]: [talos] setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["pool.ntp.org"]}
10.10.99.22: user: warning: [2023-06-05T07:51:40.006284662Z]: [talos] setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["pool.ntp.org"]}
10.10.99.22: user: warning: [2023-06-05T07:51:40.175356662Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"]}
10.10.99.22: user: warning: [2023-06-05T07:51:40.345166662Z]: [talos] task enforceKSPPRequirements (1/6): done, 2.019282589s
10.10.99.22: user: warning: [2023-06-05T07:51:40.427561662Z]: [talos] phase systemRequirements (2/11): done, 2.342770417s
10.10.99.22: user: warning: [2023-06-05T07:51:40.506756662Z]: [talos] phase integrity (3/11): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:40.568314662Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"]}
10.10.99.22: user: warning: [2023-06-05T07:51:40.737957662Z]: [talos] task writeIMAPolicy (1/1): starting
10.10.99.22: kern: notice: [2023-06-05T07:51:40.800589662Z]: audit: type=1807 audit(1685951456.768:2): action=dont_measure fsmagic=0x9fa0 res=1
10.10.99.22: kern: info: [2023-06-05T07:51:40.800706662Z]: ima: policy update completed
10.10.99.22: user: warning: [2023-06-05T07:51:40.837149662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on [::1]:53: read udp [::1]:34995->[::1]:53: read: connection refused"}
10.10.99.22: kern: notice: [2023-06-05T07:51:40.903715662Z]: audit: type=1807 audit(1685951456.768:3): action=dont_measure fsmagic=0x62656572 res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:41.308573662Z]: audit: type=1807 audit(1685951456.768:4): action=dont_measure fsmagic=0x64626720 res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:41.415795662Z]: audit: type=1807 audit(1685951456.768:5): action=dont_measure fsmagic=0x1021994 res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:41.521980662Z]: audit: type=1807 audit(1685951456.768:6): action=dont_measure fsmagic=0x1cd1 res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:41.625045662Z]: audit: type=1807 audit(1685951456.768:7): action=dont_measure fsmagic=0x42494e4d res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:41.732269662Z]: audit: type=1807 audit(1685951456.768:8): action=dont_measure fsmagic=0x73636673 res=1
10.10.99.22: user: warning: [2023-06-05T07:51:41.837909662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on [::1]:53: read udp [::1]:45125->[::1]:53: read: connection refused"}
10.10.99.22: kern: notice: [2023-06-05T07:51:41.839495662Z]: audit: type=1807 audit(1685951456.768:9): action=dont_measure fsmagic=0xf97cff8c res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:42.197461662Z]: audit: type=1807 audit(1685951456.768:10): action=dont_measure fsmagic=0x43415d53 res=1
10.10.99.22: kern: notice: [2023-06-05T07:51:42.305724662Z]: audit: type=1807 audit(1685951456.768:11): action=dont_measure fsmagic=0x27e0eb res=1
10.10.99.22: user: warning: [2023-06-05T07:51:42.414876662Z]: [talos] task writeIMAPolicy (1/1): done, 1.846531429s
10.10.99.22: user: warning: [2023-06-05T07:51:42.487872662Z]: [talos] phase integrity (3/11): done, 1.98112025s
10.10.99.22: user: warning: [2023-06-05T07:51:42.556693662Z]: [talos] phase etc (4/11): 2 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:42.611943662Z]: [talos] task createOSReleaseFile (2/2): starting
10.10.99.22: user: warning: [2023-06-05T07:51:42.679752662Z]: [talos] task CreateSystemCgroups (1/2): starting
10.10.99.22: user: warning: [2023-06-05T07:51:42.747606662Z]: [talos] task createOSReleaseFile (2/2): done, 67.987847ms
10.10.99.22: user: warning: [2023-06-05T07:51:42.840385662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on [::1]:53: read udp [::1]:37257->[::1]:53: read: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:51:43.091781662Z]: [talos] task CreateSystemCgroups (1/2): done, 479.844713ms
10.10.99.22: user: warning: [2023-06-05T07:51:43.169984662Z]: [talos] phase etc (4/11): done, 613.293637ms
10.10.99.22: user: warning: [2023-06-05T07:51:43.233519662Z]: [talos] phase earlyServices (5/11): 2 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:43.299156662Z]: [talos] task startMachined (2/2): starting
10.10.99.22: user: warning: [2023-06-05T07:51:43.360757662Z]: [talos] service[machined](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:51:43.435836662Z]: [talos] task startUdevd (1/2): starting
10.10.99.22: kern: info: [2023-06-05T07:51:43.438077662Z]: e1000e 0000:00:19.0 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
SUBSYSTEM=pci
DEVICE=+pci:0000:00:19.0
10.10.99.22: user: warning: [2023-06-05T07:51:43.494320662Z]: [talos] service[machined](Preparing): Creating service runner
10.10.99.22: kern: info: [2023-06-05T07:51:43.597401662Z]: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
10.10.99.22: user: warning: [2023-06-05T07:51:43.678563662Z]: [talos] service[udevd](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:51:43.825808662Z]: [talos] service[machined](Running): Service started as goroutine
10.10.99.22: user: warning: [2023-06-05T07:51:43.952477662Z]: [talos] service[udevd](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:51:44.092470662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on [::1]:53: read udp [::1]:59633->[::1]:53: read: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:51:44.754512662Z]: [talos] service[machined](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:51:44.833716662Z]: [talos] task startMachined (2/2): done, 1.534557056s
10.10.99.22: user: warning: [2023-06-05T07:51:45.343765662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable"}
10.10.99.22: user: warning: [2023-06-05T07:51:46.305049662Z]: [talos] service[udevd](Running): Process Process(["/sbin/udevd" "--resolve-names=never"]) started with PID 740
10.10.99.22: daemon: info: [2023-06-05T07:51:46.487531662Z]: udevd[740]: starting version 3.2.11
10.10.99.22: user: warning: [2023-06-05T07:51:46.593326662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable"}
10.10.99.22: daemon: info: [2023-06-05T07:51:46.839742662Z]: udevd[740]: starting eudev-3.2.11
10.10.99.22: user: warning: [2023-06-05T07:51:47.228728662Z]: [talos] service[udevd](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:51:47.305040662Z]: [talos] task startUdevd (1/2): done, 4.005868226s
10.10.99.22: user: warning: [2023-06-05T07:51:47.374113662Z]: [talos] phase earlyServices (5/11): done, 4.140583482s
10.10.99.22: user: warning: [2023-06-05T07:51:47.448343662Z]: [talos] phase meta (6/11): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:47.504633662Z]: [talos] task reloadMeta (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:47.569133662Z]: [talos] META: loaded 1 keys
10.10.99.22: user: warning: [2023-06-05T07:51:47.615148662Z]: [talos] task reloadMeta (1/1): done, 110.516192ms
10.10.99.22: user: warning: [2023-06-05T07:51:47.683917662Z]: [talos] phase meta (6/11): done, 235.585036ms
10.10.99.22: user: warning: [2023-06-05T07:51:47.748599662Z]: [talos] phase dashboard (7/11): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:47.810191662Z]: [talos] task startDashboard (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:47.872786662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable"}
10.10.99.22: user: warning: [2023-06-05T07:51:48.121656662Z]: [talos] service[dashboard](Waiting): Waiting for service "machined" to be "up", file "/system/run/machined/machine.sock" to exist
10.10.99.22: user: warning: [2023-06-05T07:51:48.121709662Z]: [talos] service[dashboard](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:51:48.121719662Z]: [talos] service[dashboard](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:51:48.432123662Z]: [talos] task startDashboard (1/1): done, 621.92733ms
10.10.99.22: user: warning: [2023-06-05T07:51:48.504070662Z]: [talos] phase dashboard (7/11): done, 755.472269ms
10.10.99.22: user: warning: [2023-06-05T07:51:48.573967662Z]: [talos] service[dashboard](Running): Process Process(["/sbin/dashboard"]) started with PID 1753
10.10.99.22: user: warning: [2023-06-05T07:51:48.690684662Z]: [talos] phase mountSystem (9/11): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:48.754320662Z]: [talos] task mountStatePartition (1/1): starting
10.10.99.22: kern: notice: [2023-06-05T07:51:48.823890662Z]: XFS (sda5): Mounting V5 Filesystem
10.10.99.22: kern: info: [2023-06-05T07:51:48.901241662Z]: XFS (sda5): Ending clean mount
10.10.99.22: user: warning: [2023-06-05T07:51:48.952308662Z]: [talos] task mountStatePartition (1/1): done, 198.001536ms
10.10.99.22: user: warning: [2023-06-05T07:51:49.030570662Z]: [talos] phase mountSystem (9/11): done, 456.537867ms
10.10.99.22: user: warning: [2023-06-05T07:51:49.102518662Z]: [talos] phase config (10/11): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:49.162016662Z]: [talos] task loadConfig (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:49.220432662Z]: [talos] node identity established {"component": "controller-runtime", "controller": "cluster.NodeIdentityController", "node_id": "m0UA5qKQMuD6hK1yzOcW172Rnp5SuWuSO62oc6l02HM"}
10.10.99.22: user: warning: [2023-06-05T07:51:49.420310662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable"}
10.10.99.22: user: warning: [2023-06-05T07:51:49.670205662Z]: [talos] task loadConfig (1/1): persistence is enabled, using existing config on disk
10.10.99.22: user: warning: [2023-06-05T07:51:49.775496662Z]: [talos] task loadConfig (1/1): done, 613.480118ms
10.10.99.22: user: warning: [2023-06-05T07:51:49.844853662Z]: [talos] phase config (10/11): done, 742.349701ms
10.10.99.22: user: warning: [2023-06-05T07:51:49.912695662Z]: [talos] phase unmountSystem (11/11): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:49.979453662Z]: [talos] task unmountStatePartition (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:50.054930662Z]: [talos] assigned address {"component": "controller-runtime", "controller": "network.AddressSpecController", "address": "10.10.99.22/24", "link": "eth0"}
10.10.99.22: user: warning: [2023-06-05T07:51:50.230931662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "network.RouteSpecController", "error": "1 error occurred:\x5cn\x5ct* error adding route: netlink receive: network is unreachable, message {Family:2 DstLength:0 SrcLength:0 Tos:0 Table:0 Protocol:4 Scope:0 Type:1 Flags:0 Attributes:{Dst:<nil> Src:<nil> Gateway:10.10.99.1 OutIface:4 Priority:1024 Table:254 Mark:0 Pref:<nil> Expires:<nil> Metrics:<nil> Multipath:[]}}\x5cn\x5cn"}
10.10.99.22: user: warning: [2023-06-05T07:51:50.709532662Z]: [talos] setting hostname {"component": "controller-runtime", "controller": "network.HostnameSpecController", "hostname": "atlantis-nuc-1", "domainname": ""}
10.10.99.22: user: warning: [2023-06-05T07:51:50.889667662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.StaticEndpointController", "error": "error resolving \x5c"cp.hl.dongxi.ch\x5c": lookup cp.hl.dongxi.ch on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable"}
10.10.99.22: user: warning: [2023-06-05T07:51:51.165124662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:51:51.479579662Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable"}
10.10.99.22: user: warning: [2023-06-05T07:51:51.728328662Z]: [talos] hello failed {"component": "controller-runtime", "controller": "cluster.DiscoveryServiceController", "error": "rpc error: code = Unavailable desc = connection error: desc = \x5c"transport: Error while dialing: dial tcp: lookup discovery.talos.dev on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable\x5c"", "endpoint": "discovery.talos.dev:443"}
10.10.99.22: user: warning: [2023-06-05T07:51:52.122653662Z]: [talos] setting hostname {"component": "controller-runtime", "controller": "network.HostnameSpecController", "hostname": "atlantis-nuc-1", "domainname": ""}
10.10.99.22: user: warning: [2023-06-05T07:51:52.122661662Z]: [talos] created route {"component": "controller-runtime", "controller": "network.RouteSpecController", "destination": "default", "gateway": "10.10.99.1", "table": "main", "link": "eth0"}
10.10.99.22: user: warning: [2023-06-05T07:51:52.514000662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: kern: notice: [2023-06-05T07:51:52.828422662Z]: XFS (sda5): Unmounting Filesystem
10.10.99.22: user: warning: [2023-06-05T07:51:52.834520662Z]: [talos] task unmountStatePartition (1/1): done, 2.855069914s
10.10.99.22: user: warning: [2023-06-05T07:51:52.961846662Z]: [talos] phase unmountSystem (11/11): done, 3.049152613s
10.10.99.22: user: warning: [2023-06-05T07:51:53.037400662Z]: [talos] adjusting time (jump) by 27.667831936s via 141.98.136.83, state TIME_OK, status STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.10.99.22: user: warning: [2023-06-05T07:51:53.231019662Z]: [talos] initialize sequence: done: 15.085233176s
10.10.99.22: user: warning: [2023-06-05T07:51:53.298812662Z]: [talos] install sequence: 0 phase(s)
10.10.99.22: user: warning: [2023-06-05T07:51:53.298820662Z]: [talos] install sequence: done: 10.283\xc2\xb5s
10.10.99.22: user: warning: [2023-06-05T07:51:53.415079662Z]: [talos] synchronized RTC with system clock {"component": "controller-runtime", "controller": "time.SyncController"}
10.10.99.22: user: warning: [2023-06-05T07:51:53.555528662Z]: [talos] boot sequence: 21 phase(s)
10.10.99.22: user: warning: [2023-06-05T07:51:53.610519662Z]: [talos] phase saveStateEncryptionConfig (1/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:53.688709662Z]: [talos] service[apid](Waiting): Waiting for service "containerd" to be "up", api certificates
10.10.99.22: user: warning: [2023-06-05T07:51:53.803708662Z]: [talos] task SaveStateEncryptionConfig (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:53.877838662Z]: [talos] task SaveStateEncryptionConfig (1/1): done, 74.132304ms
10.10.99.22: user: warning: [2023-06-05T07:51:53.961312662Z]: [talos] phase saveStateEncryptionConfig (1/21): done, 350.790789ms
10.10.99.22: user: warning: [2023-06-05T07:51:53.961324662Z]: [talos] phase mountState (2/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:53.961355662Z]: [talos] task mountStatePartition (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:53.966206662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: kern: notice: [2023-06-05T07:51:54.492712662Z]: XFS (sda5): Mounting V5 Filesystem
10.10.99.22: kern: info: [2023-06-05T07:51:54.565508662Z]: XFS (sda5): Ending clean mount
10.10.99.22: user: warning: [2023-06-05T07:51:54.616722662Z]: [talos] task mountStatePartition (1/1): done, 655.375834ms
10.10.99.22: user: warning: [2023-06-05T07:51:54.694898662Z]: [talos] phase mountState (2/21): done, 733.573973ms
10.10.99.22: user: warning: [2023-06-05T07:51:54.765794662Z]: [talos] phase validateConfig (3/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:54.765812662Z]: [talos] task validateConfig (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:54.895135662Z]: [talos] task validateConfig (1/1): done, 129.321868ms
10.10.99.22: user: warning: [2023-06-05T07:51:54.968088662Z]: [talos] service[apid](Waiting): Waiting for service "containerd" to be registered
10.10.99.22: user: warning: [2023-06-05T07:51:54.968129662Z]: [talos] phase validateConfig (3/21): done, 202.325088ms
10.10.99.22: user: warning: [2023-06-05T07:51:54.968137662Z]: [talos] phase saveConfig (4/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:54.968153662Z]: [talos] task saveConfig (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:54.968984662Z]: [talos] task saveConfig (1/1): done, 829.373\xc2\xb5s
10.10.99.22: user: warning: [2023-06-05T07:51:55.332845662Z]: [talos] phase saveConfig (4/21): done, 364.693672ms
10.10.99.22: user: warning: [2023-06-05T07:51:55.403749662Z]: [talos] phase memorySizeCheck (5/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:55.403758662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:51:55.403812662Z]: [talos] task memorySizeCheck (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:55.849359662Z]: [talos] memory size is OK
10.10.99.22: user: warning: [2023-06-05T07:51:55.893165662Z]: [talos] memory size is 7788 MiB
10.10.99.22: user: warning: [2023-06-05T07:51:55.893180662Z]: [talos] task memorySizeCheck (1/1): done, 489.380564ms
10.10.99.22: user: warning: [2023-06-05T07:51:55.893192662Z]: [talos] phase memorySizeCheck (5/21): done, 489.447022ms
10.10.99.22: user: warning: [2023-06-05T07:51:55.893197662Z]: [talos] phase diskSizeCheck (6/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:55.893206662Z]: [talos] task diskSizeCheck (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:55.893215662Z]: [talos] disk size is OK
10.10.99.22: user: warning: [2023-06-05T07:51:56.262050662Z]: [talos] disk size is 488386 MiB
10.10.99.22: user: warning: [2023-06-05T07:51:56.262071662Z]: [talos] task diskSizeCheck (1/1): done, 368.86194ms
10.10.99.22: user: warning: [2023-06-05T07:51:56.383012662Z]: [talos] phase diskSizeCheck (6/21): done, 489.81167ms
10.10.99.22: user: warning: [2023-06-05T07:51:56.383019662Z]: [talos] phase env (7/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:56.383056662Z]: [talos] task setUserEnvVars (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:56.573735662Z]: [talos] task setUserEnvVars (1/1): done, 190.677334ms
10.10.99.22: user: warning: [2023-06-05T07:51:56.573746662Z]: [talos] phase env (7/21): done, 190.727317ms
10.10.99.22: user: warning: [2023-06-05T07:51:56.710284662Z]: [talos] phase containerd (8/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:51:56.710306662Z]: [talos] task startContainerd (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:51:56.710341662Z]: [talos] service[containerd](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:51:56.913520662Z]: [talos] service[containerd](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:51:56.913527662Z]: [talos] service[apid](Waiting): Waiting for service "containerd" to be "up"
10.10.99.22: user: warning: [2023-06-05T07:51:56.913933662Z]: [talos] service[containerd](Running): Process Process(["/bin/containerd" "--address" "/system/run/containerd/containerd.sock" "--state" "/system/run/containerd" "--root" "/system/var/lib/containerd"]) started with PID 1784
10.10.99.22: user: warning: [2023-06-05T07:51:57.856782662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:51:59.916323662Z]: [talos] service[containerd](Running): Health check failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
10.10.99.22: user: warning: [2023-06-05T07:52:01.245881662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:02.916347662Z]: [talos] service[containerd](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:52:02.997705662Z]: [talos] task startContainerd (1/1): done, 6.28738424s
10.10.99.22: user: warning: [2023-06-05T07:52:03.070653662Z]: [talos] service[apid](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:52:03.070693662Z]: [talos] phase containerd (8/21): done, 6.36040808s
10.10.99.22: user: warning: [2023-06-05T07:52:03.211364662Z]: [talos] service[apid](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:52:03.211381662Z]: [talos] phase dbus (9/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:03.211427662Z]: [talos] task startDBus (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:03.402718662Z]: [talos] task startDBus (1/1): done, 191.287142ms
10.10.99.22: user: warning: [2023-06-05T07:52:03.470562662Z]: [talos] phase dbus (9/21): done, 259.181532ms
10.10.99.22: user: warning: [2023-06-05T07:52:03.535215662Z]: [talos] phase ephemeral (10/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:03.597884662Z]: [talos] task mountEphemeralPartition (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:03.671366662Z]: [talos] formatting the partition "/dev/sda6" as "xfs" with label "EPHEMERAL"
10.10.99.22: user: warning: [2023-06-05T07:52:04.508122662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: kern: notice: [2023-06-05T07:52:04.529085662Z]: XFS (sda6): Mounting V5 Filesystem
10.10.99.22: user: warning: [2023-06-05T07:52:04.822450662Z]: [talos] service[apid](Running): Started task apid (PID 1841) for container apid
10.10.99.22: kern: info: [2023-06-05T07:52:04.885820662Z]: XFS (sda6): Ending clean mount
10.10.99.22: user: warning: [2023-06-05T07:52:05.056926662Z]: [talos] task mountEphemeralPartition (1/1): done, 1.459056261s
10.10.99.22: user: warning: [2023-06-05T07:52:05.139310662Z]: [talos] phase ephemeral (10/21): done, 1.604103195s
10.10.99.22: user: warning: [2023-06-05T07:52:05.210223662Z]: [talos] phase var (11/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:05.266599662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletServiceController", "error": "error writing kubelet PKI: open /etc/kubernetes/bootstrap-kubeconfig: read-only file system"}
10.10.99.22: user: warning: [2023-06-05T07:52:05.266623662Z]: [talos] task setupVarDirectory (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:05.269596662Z]: [talos] task setupVarDirectory (1/1): done, 2.973783ms
10.10.99.22: user: warning: [2023-06-05T07:52:05.643610662Z]: [talos] phase var (11/21): done, 433.384806ms
10.10.99.22: user: warning: [2023-06-05T07:52:05.708274662Z]: [talos] phase overlay (12/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:05.768817662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletServiceController", "error": "error writing kubelet PKI: open /etc/kubernetes/bootstrap-kubeconfig: read-only file system"}
10.10.99.22: user: warning: [2023-06-05T07:52:05.768852662Z]: [talos] task mountOverlayFilesystems (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:05.769967662Z]: [talos] task mountOverlayFilesystems (1/1): done, 1.145197ms
10.10.99.22: user: warning: [2023-06-05T07:52:06.158273662Z]: [talos] phase overlay (12/21): done, 449.999228ms
10.10.99.22: user: warning: [2023-06-05T07:52:06.227033662Z]: [talos] phase legacyCleanup (13/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:06.293813662Z]: [talos] task cleanupLegacyStaticPodFiles (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:06.370031662Z]: [talos] task cleanupLegacyStaticPodFiles (1/1): done, 76.217559ms
10.10.99.22: user: warning: [2023-06-05T07:52:06.455528662Z]: [talos] phase legacyCleanup (13/21): done, 228.505279ms
10.10.99.22: user: warning: [2023-06-05T07:52:06.530605662Z]: [talos] phase udevSetup (14/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:06.530647662Z]: [talos] task writeUdevRules (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:06.655807662Z]: [talos] task writeUdevRules (1/1): done, 125.169465ms
10.10.99.22: user: warning: [2023-06-05T07:52:06.728734662Z]: [talos] phase udevSetup (14/21): done, 198.130763ms
10.10.99.22: user: warning: [2023-06-05T07:52:06.799653662Z]: [talos] phase userDisks (15/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:06.799674662Z]: [talos] task mountUserDisks (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:06.799686662Z]: [talos] task mountUserDisks (1/1): done, 12.738\xc2\xb5s
10.10.99.22: user: warning: [2023-06-05T07:52:06.994563662Z]: [talos] phase userDisks (15/21): done, 194.911327ms
10.10.99.22: user: warning: [2023-06-05T07:52:07.065499662Z]: [talos] service[kubelet](Waiting): Waiting for service "cri" to be "up", time sync, network
10.10.99.22: user: warning: [2023-06-05T07:52:07.065529662Z]: [talos] phase userSetup (16/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:07.240547662Z]: [talos] task writeUserFiles (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:07.303125662Z]: [talos] task writeUserFiles (1/1): done, 62.57948ms
10.10.99.22: user: warning: [2023-06-05T07:52:07.303137662Z]: [talos] phase userSetup (16/21): done, 237.633839ms
10.10.99.22: user: warning: [2023-06-05T07:52:07.303143662Z]: [talos] phase lvm (17/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:07.303157662Z]: [talos] task activateLogicalVolumes (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:07.763881662Z]: [talos] task activateLogicalVolumes (1/1): done, 460.715815ms
10.10.99.22: user: warning: [2023-06-05T07:52:07.845194662Z]: [talos] phase lvm (17/21): done, 542.050091ms
10.10.99.22: user: warning: [2023-06-05T07:52:07.909863662Z]: [talos] phase startEverything (18/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:07.978736662Z]: [talos] task startAllServices (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:07.978769662Z]: [talos] task startAllServices (1/1): waiting for 9 services
10.10.99.22: user: warning: [2023-06-05T07:52:08.122586662Z]: [talos] service[cri](Waiting): Waiting for network
10.10.99.22: user: warning: [2023-06-05T07:52:08.192458662Z]: [talos] service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network, etcd spec
10.10.99.22: user: warning: [2023-06-05T07:52:08.313321662Z]: [talos] service[trustd](Waiting): Waiting for service "containerd" to be "up", time sync, network
10.10.99.22: user: warning: [2023-06-05T07:52:08.432103662Z]: [talos] service[kubelet](Waiting): Waiting for service "cri" to be "up"
10.10.99.22: user: warning: [2023-06-05T07:52:08.523779662Z]: [talos] service[cri](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:52:08.593698662Z]: [talos] service[trustd](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:52:08.593764662Z]: [talos] service[cri](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:52:08.742697662Z]: [talos] task startAllServices (1/1): service "apid" to be "up", service "containerd" to be "up", service "cri" to be "up", service "dashboard" to be "up", service "etcd" to be "up", service "kubelet" to be "up", service "machined" to be "up", service "trustd" to be "up", service "udevd" to be "up"
10.10.99.22: user: warning: [2023-06-05T07:52:09.070535662Z]: [talos] service[trustd](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:52:09.149812662Z]: [talos] service[cri](Running): Process Process(["/bin/containerd" "--address" "/run/containerd/containerd.sock" "--config" "/etc/cri/containerd.toml"]) started with PID 1872
10.10.99.22: user: warning: [2023-06-05T07:52:09.347734662Z]: [talos] service[etcd](Waiting): Waiting for service "cri" to be "up"
10.10.99.22: user: warning: [2023-06-05T07:52:09.436292662Z]: [talos] service[apid](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:52:09.581293662Z]: [talos] service[trustd](Running): Started task trustd (PID 1906) for container trustd
10.10.99.22: user: warning: [2023-06-05T07:52:10.073825662Z]: [talos] service[cri](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:52:10.147895662Z]: [talos] service[kubelet](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:52:10.221913662Z]: [talos] service[etcd](Preparing): Running pre state
10.10.99.22: user: warning: [2023-06-05T07:52:10.292892662Z]: [talos] service[trustd](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:52:13.085736662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:15.102208662Z]: [talos] etcd is waiting to join the cluster, if this node is the first node in the cluster, please run `talosctl bootstrap` against one of the following IPs:
10.10.99.22: user: warning: [2023-06-05T07:52:15.283419662Z]: [talos] [10.10.99.22 2a02:168:5db4:0:f64d:30ff:fe66:faa9]
10.10.99.22: user: warning: [2023-06-05T07:52:15.400262662Z]: [talos] service[etcd](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:52:15.623785662Z]: [talos] service[etcd](Running): Started task etcd (PID 1964) for container etcd
10.10.99.22: user: warning: [2023-06-05T07:52:16.513338662Z]: [talos] successfully promoted etcd member
10.10.99.22: user: warning: [2023-06-05T07:52:20.188003662Z]: [talos] service[kubelet](Preparing): Creating service runner
10.10.99.22: user: warning: [2023-06-05T07:52:20.409663662Z]: [talos] service[kubelet](Running): Started task kubelet (PID 2011) for container kubelet
10.10.99.22: user: warning: [2023-06-05T07:52:20.520469662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:20.835076662Z]: [talos] service[etcd](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:52:20.915674662Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-apiserver"}
10.10.99.22: user: warning: [2023-06-05T07:52:21.077240662Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-controller-manager"}
10.10.99.22: user: warning: [2023-06-05T07:52:21.248229662Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-scheduler"}
10.10.99.22: user: warning: [2023-06-05T07:52:21.429601662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:22.096446662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:22.409853662Z]: [talos] service[kubelet](Running): Health check successful
10.10.99.22: user: warning: [2023-06-05T07:52:22.488136662Z]: [talos] task startAllServices (1/1): done, 14.509398342s
10.10.99.22: user: warning: [2023-06-05T07:52:22.564271662Z]: [talos] phase startEverything (18/21): done, 14.65440353s
10.10.99.22: user: warning: [2023-06-05T07:52:22.641408662Z]: [talos] phase labelControlPlane (19/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:22.712372662Z]: [talos] task labelNodeAsControlPlane (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:22.714071662Z]: [talos] retrying error: Get "https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s": dial tcp [::1]:6443: connect: connection refused
10.10.99.22: user: warning: [2023-06-05T07:52:23.544398662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:24.243231662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:52:25.561189662Z]: [talos] adjusting time (slew) by 19.608331ms via 141.98.136.83, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.10.99.22: user: warning: [2023-06-05T07:52:25.905428662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:27.343721662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:29.837988662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:33.107415662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:39.485858662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\x5cn\x5cterror getting node: Get \x5c"https://localhost:6443/api/v1/nodes/atlantis-nuc-1?timeout=30s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:39.920377662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:52:40.676512662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-4dri0t: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
10.10.99.22: user: warning: [2023-06-05T07:52:48.371632662Z]: [talos] task labelNodeAsControlPlane (1/1): done, 25.669113789s
10.10.99.22: user: warning: [2023-06-05T07:52:48.455249662Z]: [talos] phase labelControlPlane (19/21): done, 25.823721094s
10.10.99.22: user: warning: [2023-06-05T07:52:48.535733662Z]: [talos] phase uncordon (20/21): 1 tasks(s)
10.10.99.22: user: warning: [2023-06-05T07:52:48.597361662Z]: [talos] task uncordonNode (1/1): starting
10.10.99.22: user: warning: [2023-06-05T07:52:48.609907662Z]: [talos] retrying error: node not ready
10.10.99.22: user: warning: [2023-06-05T07:52:55.795895662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:53:11.571854662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:53:27.544724662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:53:43.213443662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:53:59.093788662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:54:15.053140662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:54:30.748697662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:54:46.791858662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:55:02.606927662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:55:18.261623662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:55:33.839580662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:55:49.736743662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:56:05.407285662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:56:21.190527662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
10.10.99.22: user: warning: [2023-06-05T07:56:36.827266662Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
which means the nodes is stuck at NotReady forever
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
atlantis-nuc-1 NotReady,SchedulingDisabled control-plane 16d v1.27.2 10.10.99.22 <none> Talos (v1.4.5) 6.1.30-talos containerd://1.6.21
this is after trying to downgrade to 1.4.4 by running talosctl upgrade --image ghcr.io/siderolabs/installer:v1.4.4 --wait=false -n 10.10.99.22
and I just realized that the machine config I posted above was incorrect / not up to date: I do have rotate-server-certificates: 'true'
as you suspected.
kubelet:
defaultRuntimeSeccompProfileEnabled: true
disableManifestsDirectory: true
extraArgs:
rotate-server-certificates: 'true'
image: ghcr.io/siderolabs/kubelet:v1.27.2
(I updated the original comment)
sorry about that
~also note that the CSR approver does not work with k8s 1.27~
Hmm seems it got fixed https://github.com/alex1989hu/kubelet-serving-cert-approver/issues/139
so make sure it's using the latest version. If you just need to approve CSR, you can use Talos CCM https://github.com/siderolabs/talos-cloud-controller-manager
yes, I had to approve certs manually until the fix was released
also: it works for worker nodes!
not sure what you mean with talos CCM => are you saying we can use it as an alternative to Kubelet Serving Certificate Approver to sign kublet CSRs?
trying to be more explicit ... I ran
talosctl upgrade --image ghcr.io/siderolabs/installer:v1.4.4 --wait=false -n 10.10.99.22
and
talosctl upgrade --image ghcr.io/siderolabs/installer:v1.4.4 --wait=false -n 10.10.99.32
Both nodes where on talos 1.4.5 and k8s 1.27.2 before running these commands.
this is the state now:
❯ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
atlantis-dm-1 Ready control-plane 16d v1.27.2 10.10.99.23 <none> Talos (v1.4.5) 6.1.30-talos containerd://1.6.21
atlantis-hpm-1 Ready <none> 24d v1.27.2 10.10.99.31 <none> Talos (v1.4.5) 6.1.30-talos containerd://1.6.21
atlantis-hpm-2 Ready <none> 23d v1.27.2 10.10.99.32 <none> Talos (v1.4.4) 6.1.28-talos containerd://1.6.21
atlantis-nuc-1 NotReady,SchedulingDisabled control-plane 16d v1.27.2 10.10.99.22 <none> Talos (v1.4.5) 6.1.30-talos containerd://1.6.21
atlantis-zima-1 Ready control-plane 59d v1.27.2 10.10.99.21 <none> Talos (v1.4.5) 6.1.30-talos containerd://1.6.21
controlplane nodes:
talosctl upgrade ...
worker nodes:
talosctl upgrade ...
)I can also provide logs from other nodes, e.g. 10.10.99.32 but I see nothing exceptional there
not sure what you mean with talos CCM => are you saying we can use it as an alternative to Kubelet Serving Certificate Approver to sign kublet CSRs?
https://github.com/siderolabs/talos-cloud-controller-manager#node-certificate-approval
I'm still trying to wrap my head around this kubelet certificate rotation problem, so please bear with me.
From what I understood, the kubelet must use a certificate signed by the k8s cluster CA, so that its clients (e.g. metrics-server) can establish a trusted connection (because, somehow, they are configured to trust the k8s cluster CA), correct?
By reading the documentation at https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#approval it seems kube-controller-manager can be configured to automatically sign the kubelet CSR.
If so, why isn't Talos configured like this automatically (i.e. automatic kubelet certificate rotation)? Why do we need to install extra controllers (e.g. talos-cloud-controller-manager or alex1989hu/kubelet-serving-cert-approver) or extra configurations for that?
@rgl, valid point, please create a feature request for that. but I'm going to close this issue, as it's resolved, and a feature request is not a bug.
For reference, I've opened the feature request at https://github.com/siderolabs/talos/issues/8523.
Bug Report
Description
I'm having some troubles upgrading my control-plane nodes (or setting up new control-plane nodes). After the upgrade (in this case to 1.4.5) they are getting stuck at task uncordonNode
only happened on control-plane nodes, and sometimes it resolves itself after a while (only happened once, mostly it stays stuck)
current fix is to change the role to worker until the node is ready and then setting it back to controlplane
Logs
10.10.99.22: user: warning: [2023-06-01T08:05:25.769080166Z]: [talos] task uncordonNode (1/1): starting 10.10.99.22: user: warning: [2023-06-01T08:05:25.845540166Z]: [talos] retrying error: node not ready 10.10.99.22: user: warning: [2023-06-01T08:05:33.332576166Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"} 10.10.99.22: user: warning: [2023-06-01T08:05:49.241990166Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"} 10.10.99.22: user: warning: [2023-06-01T08:06:05.042627166Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
etc...
Environment
machine config: