k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.09k stars 2.35k forks source link

Extremely high system load on Ubuntu 19.10 on Pi 4 when klipperlb is used. #1297

Closed DanielWinks closed 2 years ago

DanielWinks commented 4 years ago

Version: k3s version v1.17.0+k3s.1 (0f644650)

Describe the bug When running k3s server, if klipperlb is in use, system load continually climbs until system is non-responsive. This is under the latest Ubuntu 19.10 on a Raspberry Pi 4. If started with --no-deploy=servicelb system load stays around 1 to 1.5, system is responsive.

To Reproduce Install on Ubuntu 19.10 on Raspberry Pi 4 via: curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy=traefik" sh -s -

Expected behavior K3s with Klipperlb does not cause system load to rapidly climb to 15+

Actual behavior System load climbs to 15+ causing system to become unresponsive. No particular high CPU processes noted.

Additional context Installing without klipperlb works fine: curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy=traefik --no-deploy=servicelb" sh -s -

jorneilander commented 4 years ago

I've seen the exact same behaviour in terms of the host becoming unresponsive (high load, low CPU usage). Disable servicelb didn't help though, for me it was killing snapd and/or rpcbind which solved the issue. As soon as I kill those, the system becomes responsive again approx. 1min later. Starting it makes the system unresponsive. Nevermind, removing both rpcbind and snapd from the system didn't seem to do much...

It's proven difficult to pin point which one it specifically is in my case since the unresponsiveness makes it difficult to iterate over options

DanielWinks commented 4 years ago

It's proven difficult to pin point which one it specifically is in my case since the unresponsiveness makes it difficult to iterate over options

Yes, same here. I'm also looking into snapd being the culprit too. Considering the fact that I don't even use it, and have had nothing but issues with snap-packaged apps, I'm removing it entirely to see if that helps. Along the lines of being difficult to troubleshoot, at first I thought it was just klipperlb and tried metallb instead, but it had the same issues. One thing I am certain of is that it only affects server, not client. I have a 4 Pi 4 cluster and only the node running server has the load issue, the other nodes idle around 0.2-0.4 average system load. The node running server rapidly shoots to 20+ system load and basically locks up. CPU usage is minimal, so it's likely IO contention causing it.

jorneilander commented 4 years ago

Alright, some progress. Looking at nmon I found some concerning levels of activity on the SD-card. I added LogLevelMax=3 to the k3s.service file, and it stopped outputing TRACE messages all the time. Before I removed snapd and rpcbind I saw a lot of activity from those as well as soon as K3S started. My guess is the amount of logs generated kills the throughput of the SD.

Edit: After a few hours, things still went haywire... I'm now suspecting journald to be the problem, writing constantly to persistent disk. I've set storage=volatile to see if this mitigates the issue.

rust84 commented 4 years ago

You should check out log2ram https://github.com/azlux/log2ram

I'm running successfully on pi4 with this. Log2ram consumes 128M and logs are flushed to the sd once an hour. I'm not having stability issues at the moment but this will also preserve the life of the sd card.

ericdmann commented 4 years ago

Adding some fuel here, running into what seems to be related. Distrubution:

 PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
 NAME="Raspbian GNU/Linux"
 VERSION_ID="10"
 VERSION="10 (buster)"
 VERSION_CODENAME=buster

There is a correlation (this is a bog-standard install simply using the install script on three nodes) between memory spike, disk IO, and IO wait. Most of the time this self-recovers.

pi-1 is the master node

Screen Shot 2020-01-21 at 9 34 40 AM

DanielWinks commented 4 years ago

Seems to be entirely disk-IO caused, and specifically journald. I installed and configured log2ram and now the k3s server is sitting at ~0.3 system load and has been for a few hours.

I think both Raspbian and Diet-Pi (Diet-Pi for sure) use a RAM logger to avoid SD card wear, which also has the side effect of mitigating the system load issue seen here. Ubuntu doesn't use this by default.

Purging LXD and Snapd from a system destined as a K3s node seems like a prudent plan, as they also cause a fair amount of overhead on the system and are entirely unwarranted in K3s.

ericdmann commented 4 years ago

Yes these don't seem to be installed on the system I'm troubleshooting:

pi@pi-1:~ $ sudo apt remove --purge lxd snapd
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package 'lxd' is not installed, so not removed
Package 'snapd' is not installed, so not removed
0 upgraded, 0 newly installed, 0 to remove and 64 not upgraded.
DanielWinks commented 4 years ago

@ericdmann How does your system log? To disk? I was having 20+ system load on Ubuntu 19.10 on a Pi 4 until I installed log2ram. After that it's about 0.3-0.5 average.

ericdmann commented 4 years ago

I set journald storage=volatile, turned off journald read message, and finally systemctl disable systemd-journald.service completely, which of course just starts back up. Load average stayed about the same. That being said, I still need to see if the "crash" (memory increase, load, iowait) happens today with journald disabled completely. Will report back.

ericdmann commented 4 years ago

When I force-stop systemd-journald.service, there is a brief moment of very low (and expected) load, but of course systemd-journald.service is restarted, and the load returns.

ericdmann commented 4 years ago

OK -- another big load spike. This time correlated to journalctl -b --unit systemd-journald

Jan 21 19:48:24 pi-1 systemd-journald[6614]: Suppressed 45006 messages from k3s.service
Jan 21 19:48:51 pi-1 systemd-journald[6614]: Suppressed 178341 messages from k3s.service
Jan 21 19:49:32 pi-1 systemd-journald[6614]: Suppressed 414899 messages from k3s.service

(update)

Confirmed a couple hours or so later, another lockup, another flood of suppressed messages

Jan 21 22:04:07 pi-1 systemd-journald[26272]: Suppressed 164397 messages from k3s.service
Jan 21 22:04:37 pi-1 systemd-journald[26272]: Suppressed 123799 messages from k3s.service
Jan 21 22:05:07 pi-1 systemd-journald[26272]: Suppressed 246145 messages from k3s.service
Jan 21 22:05:37 pi-1 systemd-journald[26272]: Suppressed 261251 messages from k3s.service
Jan 21 22:06:09 pi-1 systemd-journald[26272]: Suppressed 170361 messages from k3s.service
jorneilander commented 4 years ago

My issue might be different:

PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
2383 ?sys root          3.14 G     50.73 M  0.00 %  1.91 % k3s server -v 0

This is 10mins of running K3S in iotop, it's killing the SD with an enormous amount of reads continuously and I don't see a reason why.

I think I'm going to rebuild my cluster, perhaps make the master node HA and everything. Use etcd instead of sqlite etc. Too bad, this cluster ran for more than a year.

neoxack commented 4 years ago

Same problem with k3s. High(~10%) CPU load on 3.2 Ghz xeon. High (~5GB) disk writes at day. But everything works fine with microk8s.

k3s version v1.17.0+k3s.1 Ubuntu 18.04

jorneilander commented 4 years ago

Reinstalling the cluster had no effect, the node on which the primary control plane's containers are running keeps dropping-out under the massive disk load.

I'll start looking into microk8s & raspbian, considering I'm also quite annoyed with the amount of processes run in the background by Ubuntu (e.g., unattended-upgrade).

jorneilander commented 4 years ago

Alright, update from my side. I have a FreeNAS, and I wanted to test what would happen if I ran etcd inside a jail on this machine in a pool backed by a RAIDZ setup. This worked, and dropped the load significantly, but etcd was spewing errors at the same rate K3S was logging these weird TRACE messages.

Apparently, when etcd thinks queries take too long, it logs them as something similar to: etcdserver: read-only range request "key:\"/registry/leases/kube-node-lease/m725-c115-26-k8s-pf1-elk01\" " with result "range_response_count:1 size:368" took too long (321.997639ms) to execute

These coincide with the TRACE messages in K3S... So I decided to see what would happen if I loaded the data dir for etcd onto a tmpfs (ie. RAM-disk). No errors whatsoever, traffic is low, 'load' is <0.80 and stable.

My guess is K3S is reporting these failed/slow calls to K3S and floods the log-system, which in term floods the SD, and self-destructs the host.

AWDDude commented 4 years ago

I running the most recent version on k3s (v1.17.4+k3s1) on ubuntu 19.10 (64bit) on a RaspberryPi 3B+. I have an iscsi lun attached from my nas and mounted to /var/lib/rancher in an effort to prolong the life of my SD card. I used the following command to install k3s:

curl -sfL https://get.k3s.io | sudo K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="server" INSTALL_K3S_VERSION="v1.17.4+k3s1" sh -

Once the Installation is complete CPU utilization is ok, hovering around 15%, but disk reads go through the roof, here is the output of dstat:

--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  3   9  20  68   0|  51M    0 | 400B  576B|   0     0 |  14k 5160
  1  10  24  65   0|  33M    0 |  66B  790B|   0     0 |9225  3391
  2   8  29  60   0|  56M   32k| 460B  458B|   0     0 |  17k 6535
  1   7  24  68   0|  15M    0 |   0     0 |   0     0 |2600   990
  1   9  27  62   0|  27M    0 | 266B  738B|   0     0 |8485  4163
  2   7  19  72   0|  38M    0 | 378B  578B|   0     0 |  12k 6109
  2   7  28  64   0|  37M    0 | 200B  404B|   0     0 |  12k 6480

The system becomes pretty unusable at this point so its hard to gather any additional information. but if you have anything specific you would like me to look at I can try. Help?

Update: I set journald.conf to Storage to none, and it seems to have helped. I also think im running into an issue where my iscsi connection is taking up all the availible bandwidth on the nic which only has access to 300mbit bandwidth, shared bus with usb and sdcard slot. But again i think this is do to excessive disk reads when k3s server is running.

ozeta commented 4 years ago

same problem here. System load climbs to 20% circa.

Freshly installed raspberry pi4 + raspbian buster + log2ram

#cat /etc/os-release

PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

installed with:

curl -sfL https://get.k3s.io | sh -
sudo chmod 755 /etc/rancher/k3s/k3s.yaml

process taking the cpu is k3s-server

cybercorey commented 4 years ago

Seem to have the same experience on raspberry pi4 + raspbian buster. Tried log2ram, helped a small amount

blaggacao commented 4 years ago

I just dropped by, although not solving the problem, it might help workaround to use k3os. I'm trying to do this and am reporting here: https://github.com/rancher/k3os/issues/482

EDIT: I don't have a Rpi4 at hand yet, to try how far off (if so) https://github.com/pftf/RPi4 is. EDIT2: k3os is soon to be working with focal fossa kernel 5.4, which might influence the device support problems elaborated in the previous link

kradalby commented 4 years ago

@blaggacao why would it help using k3os? Is there a mitigation in that OS.

I’m seeing this on both on rpi and low powered vms.

Does anyone have a theory on what part of k3s winds up all the io?

I have tried to understand if it can be the SQLite/etcd part or logging, but not sure.

blaggacao commented 4 years ago

@kradalby I based my suggestion conservatively on the assumption, that less OS components means less undiscoverable side effects, or at least a well-known reference point. Since we don't know which component is mis behaving, it's hard to reason about mitigation.

alxndr13 commented 3 years ago

Raspbian Buster here, had the same issue. Configuring journald to use volatile storage helped a ton.

stale[bot] commented 3 years ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.