Closed pkoryzna closed 1 year ago
Have you added any other CLI flags or config file entries, other than --debug?
Is there anything unusual about this node? Are any of your filesystems on a remote share, ephemeral, or on a transactional update system?
Sorry for not mentioning it - you're correct, I obtained those logs by running sudo k3s server --debug 2>&1 | tee k3s-server.log
in hopes of seeing some more details. Nothing unusual about the node as far as I can tell, the machine is a physical amd64 box. I used to have the /var/lib/rancher/k3s
mounted on an iSCSI device but I have moved it to the internal SATA SSD a few weeks ago after installing a larger drive and it ran without any problems (commented out the fstab entry after that)
Did that perhaps not get set up properly after the reboot? Can you confirm that you've got the expected contents and mounts at that path, and nothing is being mounted there now? It feels very much like the mount is being added halfway though K3s starting up, and a bunch of content is missing.
Thank you for the suggestion. Just checked - I can confirm there are no iSCSI mounts on this system anymore, everything is on the logical volumes in VG on sda
which is the SATA SSD inside the machine. The directories in /var/lib/rancher/k3s/storage
do correspond to the PVCs I had set up. The content inside is also what I would expect.
patryk@debian:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 55.9G 0 disk
|-sda1 8:1 0 512M 0 part /boot
`-sda2 8:2 0 55.4G 0 part
|-debian--vg-root 254:0 0 54.4G 0 lvm /
`-debian--vg-swap_1 254:1 0 980M 0 lvm [SWAP]
patryk@debian:~$ sudo ls /var/lib/rancher/k3s
agent data server storage
patryk@debian:~$ sudo ls /var/lib/rancher/k3s/storage/
pvc-3ff2b39f-a108-405e-8e28-af1a093856af_3dprint_octoprint-vol-octoprint-0
pvc-5d8410e7-719c-4d90-a02e-463f4db4bde6_3dprint_octoprint-vol-octo-octoprint-0
patryk@debian:~$ df /var/lib/rancher/k3s
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/debian--vg-root 56098784 17120284 36625240 32% /
patryk@debian:~$ sudo iscsiadm -m session
iscsiadm: No active sessions.
When I was using mounted iSCSI volume I had it added as k3s.service
's dependency in systemd, which I had also commented out right after moving the data to the local volume. (I assume systemd wouldn't start the service if it was still depending on the mount, and the k3s.service
itself gets started automatically without any manual intervention from my side)
Just to be sure I removed the drop-ins completely with sudo systemctl revert k3s.service
and restarted - still the having same issue and similar error messages in the logs, even without --debug
:
May 20 00:05:06 debian k3s[1554]: time="2023-05-20T00:05:06+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
May 20 00:05:07 debian k3s[1554]: time="2023-05-20T00:05:07+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 20 00:05:11 debian k3s[1554]: time="2023-05-20T00:05:11+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
May 20 00:05:12 debian k3s[1554]: time="2023-05-20T00:05:12+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 20 00:05:15 debian k3s[1554]: time="2023-05-20T00:05:15+02:00" level=info msg="Waiting for API server to become available"
May 20 00:05:16 debian k3s[1554]: time="2023-05-20T00:05:16+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
May 20 00:05:16 debian k3s[1554]: time="2023-05-20T00:05:16+02:00" level=info msg="Waiting for API server to become available"
May 20 00:05:17 debian k3s[1554]: time="2023-05-20T00:05:17+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 20 00:05:18 debian k3s[1554]: I0520 00:05:18.238856 1554 trace.go:219] Trace[1208218320]: "Proxy via http_connect protocol over tcp" address:10.42.0.20:4443 (20-May-2023 00:03:08.434) (total time: 129803ms):
May 20 00:05:18 debian k3s[1554]: Trace[1208218320]: [2m9.80398354s] [2m9.80398354s] END
May 20 00:05:18 debian k3s[1554]: I0520 00:05:18.242742 1554 trace.go:219] Trace[1418202149]: "Proxy via http_connect protocol over tcp" address:10.42.0.20:4443 (20-May-2023 00:03:08.434) (total time: 129808ms):
May 20 00:05:18 debian k3s[1554]: Trace[1418202149]: [2m9.808432896s] [2m9.808432896s] END
May 20 00:05:18 debian k3s[1554]: I0520 00:05:18.242742 1554 trace.go:219] Trace[357151497]: "Proxy via http_connect protocol over tcp" address:10.42.0.20:4443 (20-May-2023 00:03:08.434) (total time: 129807ms):
May 20 00:05:18 debian k3s[1554]: Trace[357151497]: [2m9.807774284s] [2m9.807774284s] END
May 20 00:05:18 debian k3s[1554]: I0520 00:05:18.242937 1554 trace.go:219] Trace[25518149]: "Proxy via http_connect protocol over tcp" address:10.42.0.20:4443 (20-May-2023 00:03:08.434) (total time: 129808ms):
May 20 00:05:18 debian k3s[1554]: Trace[25518149]: [2m9.808585431s] [2m9.808585431s] END
May 20 00:05:18 debian k3s[1554]: I0520 00:05:18.243011 1554 trace.go:219] Trace[714593470]: "Proxy via http_connect protocol over tcp" address:10.42.0.20:4443 (20-May-2023 00:03:08.434) (total time: 129808ms):
May 20 00:05:18 debian k3s[1554]: Trace[714593470]: [2m9.808736988s] [2m9.808736988s] END
Let me know if there's anything else I could check!
Can you kubectl get addon -A
and ls -la /var/lib/rancher/k3s/server/manifests/
? You appear to be missing all the packaged components.
Seems like I have some stuff deployed, but nothing actually running 🤔
patryk@debian:~$ kubectl get addon -A
NAMESPACE NAME AGE
kube-system ccm 319d
kube-system coredns 319d
kube-system local-storage 319d
kube-system aggregated-metrics-reader 319d
kube-system auth-delegator 319d
kube-system auth-reader 319d
kube-system metrics-apiservice 319d
kube-system metrics-server-deployment 319d
kube-system metrics-server-service 319d
kube-system resource-reader 319d
kube-system rolebindings 319d
kube-system traefik 319d
There are manifests at that path, including the metrics-server
which seems to be somehow not deployed properly
patryk@debian:~$ sudo ls -laR /var/lib/rancher/k3s/server/manifests/
/var/lib/rancher/k3s/server/manifests/:
total 36
drwx------ 3 root root 4096 May 19 18:20 .
drwx------ 8 root root 4096 May 20 00:08 ..
-rw------- 1 root root 1774 May 15 22:42 ccm.yaml
-rw------- 1 root root 4857 May 15 22:42 coredns.yaml
-rw------- 1 root root 3635 May 15 22:42 local-storage.yaml
drwx------ 2 root root 4096 Apr 1 14:28 metrics-server
-rw------- 1 root root 1039 May 15 22:42 rolebindings.yaml
-rw------- 1 root root 1155 May 15 22:42 traefik.yaml
/var/lib/rancher/k3s/server/manifests/metrics-server:
total 36
drwx------ 2 root root 4096 Apr 1 14:28 .
drwx------ 3 root root 4096 May 19 18:20 ..
-rw------- 1 root root 393 May 15 22:42 aggregated-metrics-reader.yaml
-rw------- 1 root root 303 May 15 22:42 auth-delegator.yaml
-rw------- 1 root root 324 May 15 22:42 auth-reader.yaml
-rw------- 1 root root 293 May 15 22:42 metrics-apiservice.yaml
-rw------- 1 root root 2217 May 15 22:42 metrics-server-deployment.yaml
-rw------- 1 root root 309 May 15 22:42 metrics-server-service.yaml
-rw------- 1 root root 517 May 15 22:42 resource-reader.yaml
Hmm, I'm kind of at a loss. Have you tried stopping k3s, mounting the volume again, and then starting it again to see if perhaps some of the data was missed when you migrated off? I don't see any critical errors but clearly something is missing.
Yeah, this is very confusing indeed. I do not have a copy of the volume anymore, I removed it after (seemingly successfully) migrating so can't check anymore.
Tonight I did an apt update && apt upgrade && reboot
, removed k3s, reinstalled it from stable channel, reinstalled my helm charts and everything works as expected. I'm afraid I won't be able to reproduce the issue, my guess would be some sneaky filesystem corruption on the local SSD? Might be a good idea to check the SMART metrics soon 😅
Environmental Info: K3s Version:
Node(s) CPU architecture, OS, and Version:
Linux debian 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Cluster Configuration: 1 node
Describe the bug:
None of the workloads (in kube-system or any of my namespaces) start after system reboot or
systemctl restart k3s
Steps To Reproduce:
Installed K3s:
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh -
Expected behavior:
at least the defaults in
kube-system
runningActual behavior:
Nothing starts.
These messages about server not being ready keep appearing in the logs:
Additional context / logs: attached
k3s server --debug
logs: k3s-server.log