k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.17k stars 2.36k forks source link

k3s spontaneous implosion #10126

Closed IngwiePhoenix closed 6 months ago

IngwiePhoenix commented 6 months ago

Environmental Info: K3s Version: 1.29.3+k3s+1

Node(s) CPU architecture, OS, and Version: 1 node, RockChip RK35588, Armbian

Cluster Configuration: Just a single node that I use to "bootstrap" or build my initial cluster before adding more nodes to it.

Describe the bug: Hello there!

I just wanted to install OLM via the operator-sdk, and noticed that the kubelet was no longer answering. So, I looked at the node and saw that the k3s process had indeed stopped functioning. It had exited with 255/EXCEPTION according to SystemD.

So, I reproduced the failing start by stopping the unit and running k3s manually:

k3s.log (Gist - too big for ticket)

The log is quite noisy - but unfortunately I couldn't find out what the problem is. I didn't even change anything on the node itself at all. A few updates perhaps, but nothing that should be throwing it off balance that much.

Steps To Reproduce: I wish I knew what the cause was. All I did was install k3s and run it - nothing fancy.

Expected behavior: To see my kubelet reachable for OLM

Actual behavior: It... imploded? o.o

Additional context / logs: This runs on a NanoPi R6s, the binary is unmodified and installed as per recommendation. After a kernel update or anything that is on the level or lower of libc, I reboot. I wish I could filter by just error messages and above for situations like this...

Sorry, but I am really quite clueless here. I did use k3s-killall.sh before re-running to make sure it would start "clean".

IngwiePhoenix commented 6 months ago

Found it.

root@cluserboi /v/log# cat /etc/default/armbian-ramlog
# configuration values for the armbian-ram-logging service
#
# enable the armbian-ram-logging service?
ENABLED=false

k3s crashed because it ran out of logspace. That... is really... not good. o.o It should at least print something to STDERR imo. Found it by chance because I forgot to add a path to df -h and saw the ramlog mount was full. Disabled, rebooted, and it's back online.

brandond commented 6 months ago

k3s crashed because it ran out of logspace. That... is really... not good. o.o It should at least print something to STDERR imo.

K3s does log to stderr/stdout. That is collected by systemd and logged to journald. I'm not sure exactly what journald does with logs when the host runs out of disk space, but I'm not sure it's something that K3s needs to be enhanced to handle better.