Closed Sonlis closed 11 months ago
I close the issue since not changing the hostname fixed it, I will investigate this path. Sorry for opening an issue too quickly.
I close the issue since not changing the hostname fixed it, I will investigate this path. Sorry for opening an issue too quickly.
Hmm, we should support changing the hostname. Sounds like a bug @Unix4ever ?
We should, right. It was working fine when I was testing it last time, maybe there's some regression...
I've tried to reproduce it locally on QEMU, but no luck: worked fine. So I guess there's some additional factor that makes it fail, which we don't yet know. I wonder if you can grab timed logs. We may find something helpful there.
talosctl logs timed -n <node ip> -f
I have tried to reproduce it, by only changing the IP address: I get the same problem. I cannot get the logs sadly, as the node won't even bootstrap so I get : error fetching logs: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.129:50000: connect: connection refused". Last time I wrote by hand the time, I will edit the original post to include time.
Also a note: Because I am on Mac, I had to change this command in "updating the EEPROM": sudo mkfs.fat -I /dev/mmcblk0
by sudo diskutil eraseDisk FAT32 RPI2 MBRFormat /dev/disk2
. I don't think it has to do anything with it, just letting you know that I had to change one step from the getting started guide.
PS: there are also 2 errors that slipped through my eyes the first time, as it goes quickly. But there you go:
@Sonlis one way to debug the issue as we're flying in "blind" mode until apid
is running (to make talosctl logs
work) is to enable debug
flag in the config to output all the logs to the console. This might produce lots of logs, but it should at least give some clue to understand what is making timed
crash: https://www.talos.dev/docs/v0.8/reference/configuration/#config
debug: true
on the top level of the config
@smira I have enabled this option, but it does not give more logs.
But uploading the Config through yaml instead of interactive works though. So maybe an error in the interactive installer ?
Interesting point if that is something with interactive installer
I wonder why setting hostname breaks timed
.
Btw, what hostname have you tried to set? Maybe I can try using the same settings on my RPI4 4GB.
I assume when you tried configuring Talos one more time without the hostname you re-flashed SD card, is that correct?
Does it reproduce each time you dd
image to the SD card and run interactive installer?
I have tried Talos-master if I recall correctly. Every time it failed, I re-formatted the SD card, install the EEPROM and dd the image to the SD, then went for interactive-installer.
I have used the interactive mode, as well as the yaml config file, many times and I've always set the hostname, without an issue! So, I'm not saying there couldn't be an issue, but, if upwards of 200-300 tests is considered good enough, then, I can confirm it cannot be an issue related to setting the hostname. @Sonlis, you don't need to update the EEPROM all the time; it's done only once and saved in the EEPROM (which is on RPI-4B board), unless you need to flash it again (upgrade...). Additionally, you'd see the health check failure issue, a couple of times, because some components (API Server, bootkube, etc.) might not be up at a certain stage yet. By stage 14, you should be able to run some queries against the cluster/node, using talosctl By the way.
It is failing for me as well on first try on a loop, regardless of if I set the hostname or not.
192.168.13.124: user: warning: [2022-08-06T20:06:35.921770935Z]: [talos] phase userSetup (14/19): 1 tasks(s)
192.168.13.124: user: warning: [2022-08-06T20:06:35.928221935Z]: [talos] task writeUserFiles (1/1): starting
192.168.13.124: user: warning: [2022-08-06T20:06:35.934537935Z]: [talos] task writeUserFiles (1/1): done, 6.355885ms
192.168.13.124: user: warning: [2022-08-06T20:06:35.941479935Z]: [talos] phase userSetup (14/19): done, 19.720024ms
192.168.13.124: user: warning: [2022-08-06T20:06:35.948252935Z]: [talos] phase lvm (15/19): 1 tasks(s)
192.168.13.124: user: warning: [2022-08-06T20:06:35.953864935Z]: [talos] task activateLogicalVolumes (1/1): starting
192.168.13.124: user: warning: [2022-08-06T20:06:36.250460935Z]: [talos] task activateLogicalVolumes (1/1): done, 296.607628ms
192.168.13.124: user: warning: [2022-08-06T20:06:36.260225935Z]: [talos] phase lvm (15/19): done, 311.956832ms
192.168.13.124: user: warning: [2022-08-06T20:06:36.267449935Z]: [talos] phase startEverything (16/19): 1 tasks(s)
192.168.13.124: user: warning: [2022-08-06T20:06:36.274537935Z]: [talos] task startAllServices (1/1): starting
192.168.13.124: user: warning: [2022-08-06T20:06:36.281333935Z]: [talos] task startAllServices (1/1): waiting for 8 services
192.168.13.124: user: warning: [2022-08-06T20:06:36.289656935Z]: [talos] service[cri](Waiting): Waiting for network
192.168.13.124: user: warning: [2022-08-06T20:06:36.297119935Z]: [talos] service[trustd](Waiting): Waiting for service "containerd" to be "up", time sync, network
192.168.13.124: user: warning: [2022-08-06T20:06:36.309805935Z]: [talos] service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network
192.168.13.124: user: warning: [2022-08-06T20:06:36.321133935Z]: [talos] service[cri](Preparing): Running pre state
192.168.13.124: user: warning: [2022-08-06T20:06:36.328701935Z]: [talos] service[trustd](Preparing): Running pre state
192.168.13.124: user: warning: [2022-08-06T20:06:36.336504935Z]: [talos] service[cri](Preparing): Creating service runner
192.168.13.124: user: warning: [2022-08-06T20:06:36.344227935Z]: [talos] task startAllServices (1/1): service "apid" to be "up", service "containerd" to be "up", service "cri" to be "up", service "etcd" to be "up", service "kubelet" to be "up", service "machined" to be "up", service "trustd" to be "up", service "udevd" to be "up"
192.168.13.124: user: warning: [2022-08-06T20:06:36.371825935Z]: [talos] service[trustd](Preparing): Creating service runner
192.168.13.124: user: warning: [2022-08-06T20:06:36.398611935Z]: [talos] service[cri](Running): Process Process(["/bin/containerd" "--address" "/run/containerd/containerd.sock" "--config" "/etc/cri/containerd.toml"]) started with PID 3403
192.168.13.124: user: warning: [2022-08-06T20:06:36.515276935Z]: [talos] service[kubelet](Waiting): Waiting for service "cri" to be "up"
192.168.13.124: user: warning: [2022-08-06T20:06:36.559197935Z]: [talos] service[trustd](Running): Started task trustd (PID 3437) for container trustd
192.168.13.124: user: warning: [2022-08-06T20:06:37.321681935Z]: [talos] service[etcd](Waiting): Waiting for service "cri" to be "up"
192.168.13.124: user: warning: [2022-08-06T20:06:37.350441935Z]: [talos] service[cri](Running): Health check successful
192.168.13.124: user: warning: [2022-08-06T20:06:37.358056935Z]: [talos] service[kubelet](Preparing): Running pre state
192.168.13.124: user: warning: [2022-08-06T20:06:37.366271935Z]: [talos] service[etcd](Preparing): Running pre state
192.168.13.124: user: warning: [2022-08-06T20:06:37.407044935Z]: [talos] service[trustd](Running): Health check successful
192.168.13.124: user: warning: [2022-08-06T20:06:39.866347935Z]: [talos] service[kubelet](Preparing): Creating service runner
192.168.13.124: user: warning: [2022-08-06T20:06:43.695280935Z]: [talos] service[etcd](Preparing): Creating service runner
^[[23~192.168.13.124: user: warning: [2022-08-06T20:06:51.337853935Z]: [talos] task startAllServices (1/1): service "etcd" to be "up", service "kubelet" to be "up"
192.168.13.124: user: warning: [2022-08-06T20:06:56.669706935Z]: [talos] service[etcd](Running): Started task etcd (PID 3521) for container etcd
192.168.13.124: user: warning: [2022-08-06T20:06:56.738848935Z]: [talos] service[kubelet](Running): Started task kubelet (PID 3520) for container kubelet
192.168.13.124: user: warning: [2022-08-06T20:06:56.749759935Z]: [talos] service[etcd](Waiting): Error running Containerd(etcd), going to restart forever: task "etcd" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:06:56.806735935Z]: [talos] service[kubelet](Waiting): Error running Containerd(kubelet), going to restart forever: task "kubelet" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:07:00.826991935Z]: [talos] kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \x5c"https://192.168.13.124:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\x5c": dial tcp 192.168.13.124:6443: connect: connection refused"}
192.168.13.124: user: warning: [2022-08-06T20:07:01.941108935Z]: [talos] service[etcd](Running): Started task etcd (PID 3627) for container etcd
192.168.13.124: user: warning: [2022-08-06T20:07:02.012542935Z]: [talos] service[etcd](Waiting): Error running Containerd(etcd), going to restart forever: task "etcd" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:07:02.026898935Z]: [talos] service[kubelet](Running): Started task kubelet (PID 3649) for container kubelet
192.168.13.124: user: warning: [2022-08-06T20:07:02.095119935Z]: [talos] service[kubelet](Waiting): Error running Containerd(kubelet), going to restart forever: task "kubelet" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:07:06.337799935Z]: [talos] task startAllServices (1/1): service "etcd" to be "up", service "kubelet" to be "up"
192.168.13.124: user: warning: [2022-08-06T20:07:09.136223935Z]: [talos] service[etcd](Running): Started task etcd (PID 3768) for container etcd
192.168.13.124: user: warning: [2022-08-06T20:07:09.186552935Z]: [talos] service[kubelet](Running): Started task kubelet (PID 3769) for container kubelet
192.168.13.124: user: warning: [2022-08-06T20:07:10.642716935Z]: [talos] service[etcd](Waiting): Error running Containerd(etcd), going to restart forever: task "etcd" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:07:10.656891935Z]: [talos] service[kubelet](Waiting): Error running Containerd(kubelet), going to restart forever: task "kubelet" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:07:17.227499935Z]: [talos] service[etcd](Running): Started task etcd (PID 3892) for container etcd
192.168.13.124: user: warning: [2022-08-06T20:07:17.270486935Z]: [talos] service[kubelet](Running): Started task kubelet (PID 3893) for container kubelet
192.168.13.124: user: warning: [2022-08-06T20:07:18.719038935Z]: [talos] service[kubelet](Waiting): Error running Containerd(kubelet), going to restart forever: task "kubelet" failed: exit code 1
192.168.13.124: user: warning: [2022-08-06T20:07:18.733752935Z]: [talos] service[etcd](Waiting): Error running Containerd(etcd), going to restart
on rpi4 looks like etcd might be an amd64 binary instead of arm64, I am running talos v1.1.2 on rpi4b
talosctl -n <node ip> logs etcd
<node ip>: exec /usr/local/bin/etcd: exec format error
<node ip>: exec /usr/local/bin/etcd: exec format error
<node ip>: exec /usr/local/bin/etcd: exec format error
.
.
.
on rpi4 looks like etcd might be an amd64 binary instead of arm64, I am running talos v1.1.2 on rpi4b
it's the job of CRI to pull the right image for the right arch, it might be that containerd failed to detect the right arch. Which release image did you used to flash the pi? Also could you try resetting the node and see if this happens, talosctl reset --system-labels-to-wipe=EPHEMERAL
I've flash rpi with this latest release image v1.1.2. Also on trying to reset getting this error
$talosctl reset --system-labels-to-wipe=EPHEMERAL -n <node ip>
error executing reset: 1 error occurred:
* <node ip>: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp <node ip>:50000: i/o timeout"
even kubelet throws similar error
$talosctl -n <node ip> logs -f etcd
<node ip>: exec /usr/local/bin/etcd: exec format error
<node ip>: exec /usr/local/bin/etcd: exec format error
<node ip>: exec /usr/local/bin/etcd: exec format error
.
.
.
$talosctl images
ghcr.io/siderolabs/flannel:v0.18.1
ghcr.io/siderolabs/install-cni:v1.1.0-2-gcb03a5d
docker.io/coredns/coredns:1.9.3
gcr.io/etcd-development/etcd:v3.5.4
k8s.gcr.io/kube-apiserver:v1.24.3
k8s.gcr.io/kube-controller-manager:v1.24.3
k8s.gcr.io/kube-scheduler:v1.24.3
k8s.gcr.io/kube-proxy:v1.24.3
ghcr.io/siderolabs/kubelet:v1.24.3
ghcr.io/siderolabs/installer:v1.1.2
k8s.gcr.io/pause:3.6
seems the install is completely broken, could you re-flash and try again
tried reflashing with previous stable release v1.1.1 and that seems to boot etcd up fine.
the control plane endpoints never seems to come up though
$talosctl -n <node ip> service
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
<node ip> apid Running OK 6m56s ago Health check successful
<node ip> containerd Running OK 7m2s ago Health check successful
<node ip> cri Running OK 6m32s ago Health check successful
<node ip> etcd Running OK 6m4s ago Health check successful
<node ip> kubelet Running OK 5m53s ago Health check successful
<node ip> machined Running ? 7m9s ago Service started as goroutine
<node ip> trustd Running OK 6m32s ago Health check successful
<node ip> udevd Running OK 6m34s ago Health check successful
$talosctl dmesg -f -n <node ip>
...
<node ip>: user: warning: [2022-08-07T06:38:36.358257301Z]: [talos] service[kubelet](Running): Started task kubelet (PID 3496) for container kubelet
<node ip>: user: warning: [2022-08-07T06:38:36.369711301Z]: [talos] cleaning up static pod "/etc/kubernetes/manifests/talos-kube-apiserver.yaml" {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController"}
<node ip>: user: warning: [2022-08-07T06:38:36.388575301Z]: [talos] cleaning up static pod "/etc/kubernetes/manifests/talos-kube-controller-manager.yaml" {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController"}
<node ip>: user: warning: [2022-08-07T06:38:36.407953301Z]: [talos] cleaning up static pod "/etc/kubernetes/manifests/talos-kube-scheduler.yaml" {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController"}
<node ip>: user: warning: [2022-08-07T06:38:36.461887301Z]: [talos] service[etcd](Running): Started task etcd (PID 3528) for container etcd
<node ip>: user: warning: [2022-08-07T06:38:43.851543301Z]: [talos] service[etcd](Running): Health check successful
<node ip>: user: warning: [2022-08-07T06:38:43.872347301Z]: [talos] writing static pod "/etc/kubernetes/manifests/talos-kube-apiserver.yaml" {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController"}
<node ip>: user: warning: [2022-08-07T06:38:43.904317301Z]: [talos] writing static pod "/etc/kubernetes/manifests/talos-kube-controller-manager.yaml" {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController"}
<node ip>: user: warning: [2022-08-07T06:38:43.925992301Z]: [talos] writing static pod "/etc/kubernetes/manifests/talos-kube-scheduler.yaml" {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController"}
<node ip>: user: warning: [2022-08-07T06:38:44.460787301Z]: [talos] task startAllServices (1/1): service "kubelet" to be "up"
<node ip>: user: warning: [2022-08-07T06:38:47.208012301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-yvhv5k: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:38:47.575138301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": dial tcp 127.0.0.1:10250: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:38:51.177581301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-yvhv5k: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:38:51.656212301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-yvhv5k: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:38:54.881584301Z]: [talos] service[kubelet](Running): Health check successful
<node ip>: user: warning: [2022-08-07T06:38:54.890774301Z]: [talos] task startAllServices (1/1): done, 40.493796873s
<node ip>: user: warning: [2022-08-07T06:38:54.899622301Z]: [talos] phase startEverything (16/19): done, 40.510250763s
<node ip>: user: warning: [2022-08-07T06:38:54.908654301Z]: [talos] phase labelMaster (17/19): 1 tasks(s)
<node ip>: user: warning: [2022-08-07T06:38:54.916424301Z]: [talos] task labelNodeAsMaster (1/1): starting
<node ip>: user: warning: [2022-08-07T06:38:54.932942301Z]: [talos] retrying error: Get "https://<node ip>:6443/api/v1/nodes/talos-control-1?timeout=30s": dial tcp <node ip>:6443: connect: connection refused
<node ip>: user: warning: [2022-08-07T06:38:54.972581301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-yvhv5k: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:38:58.634897301Z]: [talos] kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \x5c"https://<node ip>:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\x5c": dial tcp <node ip>:6443: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:39:05.422844301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\x5c"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\x5c") has prevented the request from succeeding"}
<node ip>: user: warning: [2022-08-07T06:39:10.272273301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-yvhv5k: Get \x5c"https://localhost:6443/api?timeout=32s\x5c": dial tcp [::1]:6443: connect: connection refused"}
<node ip>: user: warning: [2022-08-07T06:39:21.419420301Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\x5c"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\x5c") has prevented the request from succeeding"}
what does talosctl containers -k
show and also talosctl logs -k <id of kube-apiserver>
, it seems there's is some issue with the generated config probably. It would be nice if you could provide the commands used for generating the config and whether any modifications was done to the generated config. Probably it's easier to join our slack for an easier conversation
$talosctl containers -k -n <node ip>
NODE NAMESPACE ID IMAGE PID STATUS
<node ip> k8s.io kube-system/kube-apiserver-talos-control-1 k8s.gcr.io/pause:3.6 5210 SANDBOX_READY
<node ip> k8s.io kube-system/kube-apiserver-talos-control-1 k8s.gcr.io/pause:3.6 3754 SANDBOX_READY
<node ip> k8s.io └─ kube-system/kube-apiserver-talos-control-1:kube-apiserver k8s.gcr.io/kube-apiserver:v1.24.3 0 CONTAINER_EXITED
<node ip> k8s.io └─ kube-system/kube-apiserver-talos-control-1:kube-apiserver k8s.gcr.io/kube-apiserver:v1.24.3 0 CONTAINER_EXITED
<node ip> k8s.io └─ kube-system/kube-apiserver-talos-control-1:kube-apiserver k8s.gcr.io/kube-apiserver:v1.24.3 0 CONTAINER_EXITED
<node ip> k8s.io └─ kube-system/kube-apiserver-talos-control-1:kube-apiserver k8s.gcr.io/kube-apiserver:v1.24.3 4716 CONTAINER_RUNNING
<node ip> k8s.io kube-system/kube-controller-manager-talos-control-1 k8s.gcr.io/pause:3.6 3766 SANDBOX_READY
<node ip> k8s.io └─ kube-system/kube-controller-manager-talos-control-1:kube-controller-manager k8s.gcr.io/kube-controller-manager:v1.24.3 0 CONTAINER_CREATED
<node ip> k8s.io kube-system/kube-scheduler-talos-control-1 k8s.gcr.io/pause:3.6 3691 SANDBOX_READY
<node ip> k8s.io └─ kube-system/kube-scheduler-talos-control-1:kube-scheduler k8s.gcr.io/kube-scheduler:v1.24.3 0 CONTAINER_CREATED
there seem to be some cert auth erros in apiserver logs
<node ip>: 2022-08-07T07:23:42.768472247Z stderr F I0807 07:23:42.768132 1 trace.go:205] Trace[1337185868]: "GuaranteedUpdate etcd3" type:*v1.Endpoints (07-Aug-2022 07:23:35.749) (total time: 7018ms):
<node ip>: 2022-08-07T07:23:42.768512987Z stderr F Trace[1337185868]: [7.018171569s] [7.018171569s] END
<node ip>: 2022-08-07T07:23:42.768697579Z stderr F E0807 07:23:42.768221 1 controller.go:240] unable to sync kubernetes service: etcdserver: request timed out
<node ip>: 2022-08-07T07:23:42.787298715Z stderr F {"level":"warn","ts":"2022-08-07T07:23:42.786Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x40012f1c00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
<node ip>: 2022-08-07T07:23:42.869611976Z stderr F E0807 07:23:42.869193 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:42Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:42Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:23:45.874121452Z stderr F E0807 07:23:45.873750 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:45Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:45Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:23:48.878216372Z stderr F E0807 07:23:48.877901 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:48Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:48Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:23:50.27423494Z stderr F I0807 07:23:50.272924 1 trace.go:205] Trace[589616113]: "Create" url:/api/v1/namespaces/default/events,user-agent:kube-apiserver/v1.24.3 (linux/arm64) kubernetes/aef86a9,audit-id:2f49aad3-880d-45db-ad8f-4da5008056f9,client:::1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (07-Aug-2022 07:23:35.779) (total time: 14493ms):
<node ip>: 2022-08-07T07:23:50.274591272Z stderr F Trace[589616113]: ---"Object stored in database" 14492ms (07:23:50.272)
<node ip>: 2022-08-07T07:23:50.274678512Z stderr F Trace[589616113]: [14.493497064s] [14.493497064s] END
<node ip>: 2022-08-07T07:23:50.281010143Z stderr F I0807 07:23:50.280563 1 trace.go:205] Trace[634576640]: "Get" url:/api/v1/namespaces/kube-system,user-agent:kube-apiserver/v1.24.3 (linux/arm64) kubernetes/aef86a9,audit-id:5a48d524-a671-4150-8af2-df4ce1d5b963,client:::1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (07-Aug-2022 07:23:44.207) (total time: 6072ms):
<node ip>: 2022-08-07T07:23:50.281157605Z stderr F Trace[634576640]: ---"About to write a response" 6072ms (07:23:50.280)
<node ip>: 2022-08-07T07:23:50.281194568Z stderr F Trace[634576640]: [6.072439239s] [6.072439239s] END
<node ip>: 2022-08-07T07:23:50.284519124Z stderr F I0807 07:23:50.282428 1 trace.go:205] Trace[316781015]: "GuaranteedUpdate etcd3" type:*core.RangeAllocation (07-Aug-2022 07:23:35.756) (total time: 14525ms):
<node ip>: 2022-08-07T07:23:50.284667141Z stderr F Trace[316781015]: ---"initial value restored" 11265ms (07:23:47.022)
<node ip>: 2022-08-07T07:23:50.284704641Z stderr F Trace[316781015]: ---"Transaction committed" 3259ms (07:23:50.281)
<node ip>: 2022-08-07T07:23:50.284735752Z stderr F Trace[316781015]: [14.525267756s] [14.525267756s] END
<node ip>: 2022-08-07T07:23:50.28835275Z stderr F I0807 07:23:50.287828 1 trace.go:205] Trace[1027068449]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.24.3 (linux/arm64) kubernetes/aef86a9,audit-id:8a977d3f-477e-485f-96e0-48081fd335cf,client:::1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (07-Aug-2022 07:23:42.774) (total time: 7513ms):
<node ip>: 2022-08-07T07:23:50.288475583Z stderr F Trace[1027068449]: ---"About to write a response" 7513ms (07:23:50.287)
<node ip>: 2022-08-07T07:23:50.288509898Z stderr F Trace[1027068449]: [7.51331667s] [7.51331667s] END
<node ip>: 2022-08-07T07:23:51.883484025Z stderr F E0807 07:23:51.883060 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:51Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:51Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:23:54.88813353Z stderr F E0807 07:23:54.887667 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:54Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:54Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:23:55.139120185Z stderr F I0807 07:23:55.132366 1 trace.go:205] Trace[2003737638]: "Get" url:/api/v1/namespaces/kube-public,user-agent:kube-apiserver/v1.24.3 (linux/arm64) kubernetes/aef86a9,audit-id:4e9aeda1-5dcd-4653-95db-8941ebed3f50,client:::1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (07-Aug-2022 07:23:50.331) (total time: 4800ms):
<node ip>: 2022-08-07T07:23:55.139555183Z stderr F Trace[2003737638]: ---"About to write a response" 4800ms (07:23:55.131)
<node ip>: 2022-08-07T07:23:55.139607534Z stderr F Trace[2003737638]: [4.800269031s] [4.800269031s] END
<node ip>: 2022-08-07T07:23:55.139640201Z stderr F I0807 07:23:55.134312 1 trace.go:205] Trace[1009811277]: "GuaranteedUpdate etcd3" type:*v1.Endpoints (07-Aug-2022 07:23:50.298) (total time: 4835ms):
<node ip>: 2022-08-07T07:23:55.139669645Z stderr F Trace[1009811277]: ---"Transaction committed" 4825ms (07:23:55.134)
<node ip>: 2022-08-07T07:23:55.139698163Z stderr F Trace[1009811277]: [4.835685056s] [4.835685056s] END
<node ip>: 2022-08-07T07:23:55.149183166Z stderr F W0807 07:23:55.148861 1 lease.go:234] Resetting endpoints for master service "kubernetes" to [<node ip>]
<node ip>: 2022-08-07T07:23:55.170344826Z stderr F I0807 07:23:55.169896 1 controller.go:611] quota admission added evaluator for: endpoints
<node ip>: 2022-08-07T07:23:56.758182247Z stderr F I0807 07:23:56.757883 1 trace.go:205] Trace[1721911544]: "Get" url:/apis/discovery.k8s.io/v1/namespaces/default/endpointslices/kubernetes,user-agent:kube-apiserver/v1.24.3 (linux/arm64) kubernetes/aef86a9,audit-id:9562c830-6433-46d6-bfab-bc1116d9cbe5,client:::1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (07-Aug-2022 07:23:55.188) (total time: 1568ms):
<node ip>: 2022-08-07T07:23:56.758286117Z stderr F Trace[1721911544]: [1.568944953s] [1.568944953s] END
<node ip>: 2022-08-07T07:23:56.769886645Z stderr F I0807 07:23:56.769562 1 controller.go:611] quota admission added evaluator for: endpointslices.discovery.k8s.io
<node ip>: 2022-08-07T07:23:57.89256846Z stderr F E0807 07:23:57.892171 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:57Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:23:57Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:24:00.897959327Z stderr F E0807 07:24:00.897563 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:00Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:00Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:24:03.903286729Z stderr F E0807 07:24:03.902816 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:03Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:03Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:24:06.908549388Z stderr F E0807 07:24:06.908120 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:06Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:06Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:24:09.912516571Z stderr F E0807 07:24:09.912231 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:09Z is after 2022-08-07T06:48:55Z, verifying certificate SN=45300978456042199912394603477883528136, SKID=, AKID=A7:A5:53:55:17:52:0E:D5:19:D0:F1:9C:40:00:56:A7:F3:EB:0C:2A failed: x509: certificate has expired or is not yet valid: current time 2022-08-07T07:24:09Z is after 2022-08-07T06:48:55Z]"
<node ip>: 2022-08-07T07:24:11.706112652Z stderr F I0807 07:24:11.704310 1 trace.go:205] Trace[2007212016]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.24.3 (linux/arm64) kubernetes/aef86a9,audit-id:0bdfc782-d77a-4114-9ba4-67307510b0d4,client:::1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (07-Aug-2022 07:24:06.789) (total time: 4914ms):
<node ip>: 2022-08-07T07:24:11.706747259Z stderr F Trace[2007212016]: ---"About to write a response" 4913ms (07:24:11.703)
during talosctl apply-config
the control plane node was set to the lan ip of the node.
talosctl dmesg -f -n <node ip>
was returning lots of connection refused errors to the control plane <node ip>:<6553>
.
I edited the machine config to use localhost
instead and then it started scheduling the control plane containers. the above logs are after this change.
without this change in <node lan ip>
to <localhost>
getting the following:
$talosctl containers -k -n <node lan ip>
NODE NAMESPACE ID IMAGE PID STATUS
<node lan ip> k8s.io kube-system/kube-apiserver-talos-control-1 k8s.gcr.io/pause:3.6 3703 SANDBOX_READY
<node lan ip> k8s.io └─ kube-system/kube-apiserver-talos-control-1:kube-apiserver k8s.gcr.io/kube-apiserver:v1.24.3 0 CONTAINER_CREATED
<node lan ip> k8s.io └─ kube-system/kube-apiserver-talos-control-1:kube-apiserver k8s.gcr.io/kube-apiserver:v1.24.3 0 CONTAINER_CREATED
<node lan ip> k8s.io kube-system/kube-controller-manager-talos-control-1 k8s.gcr.io/pause:3.6 3746 SANDBOX_READY
<node lan ip> k8s.io └─ kube-system/kube-controller-manager-talos-control-1:kube-controller-manager k8s.gcr.io/kube-controller-manager:v1.24.3 0 CONTAINER_CREATED
<node lan ip> k8s.io └─ kube-system/kube-controller-manager-talos-control-1:kube-controller-manager k8s.gcr.io/kube-controller-manager:v1.24.3 0 CONTAINER_CREATED
<node lan ip> k8s.io kube-system/kube-scheduler-talos-control-1 k8s.gcr.io/pause:3.6 3755 SANDBOX_READY
<node lan ip> k8s.io └─ kube-system/kube-scheduler-talos-control-1:kube-scheduler k8s.gcr.io/kube-scheduler:v1.24.3 0 CONTAINER_CREATED
<node lan ip> k8s.io └─ kube-system/kube-scheduler-talos-control-1:kube-scheduler k8s.gcr.io/kube-scheduler:v1.24.3 3831 CONTAINER_RUNNING
$talosctl logs -k kube-system/kube-apiserver-talos-control-1:kube-apiserver -n <node lan ip>
<node lan ip>: 2022-08-07T07:52:47.161872935Z stderr F I0807 07:52:47.156803 1 server.go:558] external host was not specified, using <node lan ip>
<node lan ip>: 2022-08-07T07:52:47.181726901Z stderr F I0807 07:52:47.181442 1 server.go:158] Version: v1.24.3
<node lan ip>: 2022-08-07T07:52:47.181827606Z stderr F I0807 07:52:47.181588 1 server.go:160] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
<node lan ip>: 2022-08-07T07:53:10.479681082Z stderr F I0807 07:53:10.479204 1 shared_informer.go:255] Waiting for caches to sync for node_authorizer
<node lan ip>: 2022-08-07T07:53:10.511495166Z stderr F I0807 07:53:10.510255 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
<node lan ip>: 2022-08-07T07:53:10.511655094Z stderr F I0807 07:53:10.510407 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
<node lan ip>: 2022-08-07T07:53:10.519219468Z stderr F I0807 07:53:10.517658 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
<node lan ip>: 2022-08-07T07:53:10.51931595Z stderr F I0807 07:53:10.517754 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
<node lan ip>: 2022-08-07T07:53:20.178428779Z stderr F I0807 07:53:20.177957 1 trace.go:205] Trace[1950691004]: "List(recursive=true) etcd3" key:/apiextensions.k8s.io/customresourcedefinitions,resourceVersion:,resourceVersionMatch:,limit:10000,continue: (07-Aug-2022 07:53:13.722) (total time: 6454ms):
<node lan ip>: 2022-08-07T07:53:20.178591225Z stderr F Trace[1950691004]: [6.454550408s] [6.454550408s] END
<node lan ip>: 2022-08-07T07:53:23.474343989Z stderr F W0807 07:53:23.473948 1 genericapiserver.go:557] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:23.480096847Z stderr F I0807 07:53:23.479714 1 instance.go:274] Using reconciler: lease
<node lan ip>: 2022-08-07T07:53:25.299999581Z stderr F W0807 07:53:25.299404 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:25.300137934Z stderr F E0807 07:53:25.299563 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:25.767979259Z stderr F I0807 07:53:25.767656 1 instance.go:586] API group "internal.apiserver.k8s.io" is not enabled, skipping.
<node lan ip>: 2022-08-07T07:53:26.308348864Z stderr F W0807 07:53:26.306884 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:26.308539495Z stderr F E0807 07:53:26.308052 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:26.703523002Z stderr F W0807 07:53:26.703231 1 genericapiserver.go:557] Skipping API authentication.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.71040835Z stderr F W0807 07:53:26.710144 1 genericapiserver.go:557] Skipping API authorization.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.768462564Z stderr F W0807 07:53:26.768154 1 genericapiserver.go:557] Skipping API certificates.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.775733711Z stderr F W0807 07:53:26.775450 1 genericapiserver.go:557] Skipping API coordination.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.801308478Z stderr F W0807 07:53:26.801060 1 genericapiserver.go:557] Skipping API networking.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.814456076Z stderr F W0807 07:53:26.814175 1 genericapiserver.go:557] Skipping API node.k8s.io/v1alpha1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.842395472Z stderr F W0807 07:53:26.842087 1 genericapiserver.go:557] Skipping API rbac.authorization.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.842487417Z stderr F W0807 07:53:26.842173 1 genericapiserver.go:557] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.849121059Z stderr F W0807 07:53:26.848850 1 genericapiserver.go:557] Skipping API scheduling.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.849214393Z stderr F W0807 07:53:26.848912 1 genericapiserver.go:557] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.869198266Z stderr F W0807 07:53:26.868920 1 genericapiserver.go:557] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.888548671Z stderr F W0807 07:53:26.888270 1 genericapiserver.go:557] Skipping API flowcontrol.apiserver.k8s.io/v1alpha1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.908957046Z stderr F W0807 07:53:26.908644 1 genericapiserver.go:557] Skipping API apps/v1beta2 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.909069992Z stderr F W0807 07:53:26.908710 1 genericapiserver.go:557] Skipping API apps/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.919660182Z stderr F W0807 07:53:26.919285 1 genericapiserver.go:557] Skipping API admissionregistration.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:26.940761063Z stderr F I0807 07:53:26.940459 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
<node lan ip>: 2022-08-07T07:53:26.940850712Z stderr F I0807 07:53:26.940518 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
<node lan ip>: 2022-08-07T07:53:27.064850069Z stderr F W0807 07:53:27.064538 1 genericapiserver.go:557] Skipping API apiregistration.k8s.io/v1beta1 because it has no resources.
<node lan ip>: 2022-08-07T07:53:27.312921947Z stderr F W0807 07:53:27.312536 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:27.313027725Z stderr F E0807 07:53:27.312715 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:29.092961319Z stderr F W0807 07:53:29.092583 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:29.093077968Z stderr F E0807 07:53:29.092736 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:30.096509239Z stderr F W0807 07:53:30.096226 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:30.096595536Z stderr F E0807 07:53:30.096330 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:31.101259535Z stderr F W0807 07:53:31.100887 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:31.101636779Z stderr F E0807 07:53:31.101023 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:32.11009708Z stderr F W0807 07:53:32.109723 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:32.110251285Z stderr F E0807 07:53:32.110058 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:32.28315371Z stderr F I0807 07:53:32.282758 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/system/secrets/kubernetes/kube-apiserver/aggregator-ca.crt"
<node lan ip>: 2022-08-07T07:53:32.283384878Z stderr F I0807 07:53:32.282759 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/system/secrets/kubernetes/kube-apiserver/ca.crt"
<node lan ip>: 2022-08-07T07:53:32.28367264Z stderr F I0807 07:53:32.283444 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/system/secrets/kubernetes/kube-apiserver/apiserver.crt::/system/secrets/kubernetes/kube-apiserver/apiserver.key"
<node lan ip>: 2022-08-07T07:53:32.284444256Z stderr F I0807 07:53:32.284262 1 secure_serving.go:210] Serving securely on [::]:6443
<node lan ip>: 2022-08-07T07:53:32.284527183Z stderr F I0807 07:53:32.284414 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
<node lan ip>: 2022-08-07T07:53:32.300401096Z stderr F I0807 07:53:32.300118 1 apiservice_controller.go:97] Starting APIServiceRegistrationController
<node lan ip>: 2022-08-07T07:53:32.300527283Z stderr F I0807 07:53:32.300183 1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
<node lan ip>: 2022-08-07T07:53:32.30057595Z stderr F I0807 07:53:32.300322 1 apf_controller.go:317] Starting API Priority and Fairness config controller
<node lan ip>: 2022-08-07T07:53:35.491976684Z stderr F I0807 07:53:35.491361 1 controller.go:83] Starting OpenAPI AggregationController
<node lan ip>: 2022-08-07T07:53:35.492467114Z stderr F I0807 07:53:35.491778 1 customresource_discovery_controller.go:209] Starting DiscoveryController
<node lan ip>: 2022-08-07T07:53:35.516713236Z stderr F I0807 07:53:35.515508 1 dynamic_serving_content.go:132] "Starting controller" name="aggregator-proxy-cert::/system/secrets/kubernetes/kube-apiserver/front-proxy-client.crt::/system/secrets/kubernetes/kube-apiserver/front-proxy-client.key"
<node lan ip>: 2022-08-07T07:53:35.523877659Z stderr F E0807 07:53:35.522684 1 authentication.go:63] "Unable to authenticate the request" err="invalid bearer token"
<node lan ip>: 2022-08-07T07:53:35.524209883Z stderr F I0807 07:53:35.523292 1 available_controller.go:491] Starting AvailableConditionController
<node lan ip>: 2022-08-07T07:53:35.524258902Z stderr F I0807 07:53:35.523403 1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
<node lan ip>: 2022-08-07T07:53:35.531061063Z stderr F I0807 07:53:35.530547 1 autoregister_controller.go:141] Starting autoregister controller
<node lan ip>: 2022-08-07T07:53:35.531200083Z stderr F I0807 07:53:35.530782 1 cache.go:32] Waiting for caches to sync for autoregister controller
<node lan ip>: 2022-08-07T07:53:35.531251842Z stderr F I0807 07:53:35.531107 1 controller.go:80] Starting OpenAPI V3 AggregationController
<node lan ip>: 2022-08-07T07:53:35.536671623Z stderr F I0807 07:53:35.536302 1 cluster_authentication_trust_controller.go:440] Starting cluster_authentication_trust_controller controller
<node lan ip>: 2022-08-07T07:53:35.558643376Z stderr F I0807 07:53:35.558207 1 shared_informer.go:255] Waiting for caches to sync for cluster_authentication_trust_controller
<node lan ip>: 2022-08-07T07:53:35.564877903Z stderr F W0807 07:53:35.564465 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:35.564981941Z stderr F E0807 07:53:35.564553 1 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input; reinitializing...
<node lan ip>: 2022-08-07T07:53:35.659126019Z stderr F I0807 07:53:35.658798 1 shared_informer.go:262] Caches are synced for cluster_authentication_trust_controller
<node lan ip>: 2022-08-07T07:53:35.775583557Z stderr F W0807 07:53:35.775282 1 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:35.775665372Z stderr F E0807 07:53:35.775399 1 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Secret: failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/kube-system/bootstrap-token-yvhv5k": invalid padding on input
<node lan ip>: 2022-08-07T07:53:35.795214256Z stderr F I0807 07:53:35.795014 1 shared_informer.go:262] Caches are synced for node_authorizer
<node lan ip>: 2022-08-07T07:53:37.153734879Z stderr F I0807 07:53:37.153471 1 cache.go:39] Caches are synced for AvailableConditionController controller
<node lan ip>: 2022-08-07T07:53:37.155517559Z stderr F I0807 07:53:37.155219 1 controller.go:132] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).
<node lan ip>: 2022-08-07T07:53:37.159149104Z stderr F I0807 07:53:37.157411 1 apf_controller.go:322] Running API Priority and Fairness config worker
<node lan ip>: 2022-08-07T07:53:37.15921866Z stderr F I0807 07:53:37.157713 1 controller.go:85] Starting OpenAPI controller
<node lan ip>: 2022-08-07T07:53:37.159230271Z stderr F I0807 07:53:37.158034 1 controller.go:85] Starting OpenAPI V3 controller
<node lan ip>: 2022-08-07T07:53:37.159240197Z stderr F I0807 07:53:37.158143 1 naming_controller.go:291] Starting NamingConditionController
<node lan ip>: 2022-08-07T07:53:37.206799173Z stderr F I0807 07:53:37.206378 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/system/secrets/kubernetes/kube-apiserver/ca.crt"
<node lan ip>: 2022-08-07T07:53:37.226547224Z stderr F I0807 07:53:37.226268 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/system/secrets/kubernetes/kube-apiserver/aggregator-ca.crt"
<node lan ip>: 2022-08-07T07:53:37.228822889Z stderr F I0807 07:53:37.228589 1 cache.go:39] Caches are synced for autoregister controller
<node lan ip>: 2022-08-07T07:53:40.310974153Z stderr F I0807 07:53:40.310669 1 trace.go:205] Trace[611352892]: "GuaranteedUpdate etcd3" type:*core.RangeAllocation (07-Aug-2022 07:53:37.235) (total time: 3074ms):
<node lan ip>: 2022-08-07T07:53:40.311083858Z stderr F Trace[611352892]: ---"initial value restored" 3074ms (07:53:40.310)
<node lan ip>: 2022-08-07T07:53:40.31110158Z stderr F Trace[611352892]: [3.074968194s] [3.
during the talosctl apply-config
stage itself, if I set the control plane ip to localhost, the node becomes unreachable via talosctl. maybe this is expected behaviour.
Bug Report
Description
I followed the instructions on Talos doc: https://www.talos.dev/docs/v0.8/single-board-computers/rpi_4/#updating-the-eeprom until bootstrapping the node part. I use the interactive flag, only changes I did are the cluster name and the hostname. On step 14 of the bootstrapping, phase startEverything, the bootstrap displays "Health check failed: timed not ready" and stops there. I also tried to uncheck DHCP and setup a static IP on my own, and match it with the control plane address, not fixing it.
Logs
[459.046528]servicenetworkd: Started task networks (PID 3060) for container network [459.067572]servicerouterd: Started task routers(PID 3064) for container routerd [463.465130]servicenetworkd: Health check successful [463.477263]servicetimed: Running pre state [463.678220]servicetrustd: waiting for service "timed" to be "up" [463.732484]servicecri: waiting for service "etcd" to be "up" [463.732484]serviceapid: waiting for service "timed" to be "up" [463.750343]servicekubelet: waiting for service "cri" to be "up", service "timed" to be up [463.765099]serviceetcd: waiting for service "timed" to be "up" [464.360878]unpacking talos/timed (sha256:ec708ceab993ed2671774c30bf2582638429ee5533d11c32aa35a3eb46b67a5f) [465.593584]servicetimed: Started task timed (PID 3133) for container timed [465.983903]servicetimed: Health check failed: timed is not ready [1490.104010]servicetimed: Error running Containerd(timed), going to restart forever: task "timed" failed: exit code 1
Environment