Closed elmr91 closed 11 months ago
Not only does problem persist in 10.1, but I see an increased CPU activity compared with even 10.0 of +/- 2% and a similar average temp increase. This needs addressing.
Certainly sir, would you like a caesar, a balsamic or a lemon and herb dressing?
FWIW, it is unlikely that 10.1 addresses this as there is no Docker update. It seems that this is related to containerd.
What would be interesting is if a Supervised installation with Debian 11 using Docker 23.0 (compared to Docker 20.10) sees the same increase.
The latest dev build upgrades to Docker 23.0.5, if someone could test that would be interesting:
https://os-builds.home-assistant.io/11.0.dev20230427/
Or to update your system directly (please create and download a backup, since this updates to development builds):
ha su options --channel=dev
ha su reload
ha os update
ha su options --channel=stable
Or to update your system directly (please create and download a backup, since this updates to development builds):
Proxmox VE latest, J4105, VM: 2 cores/4GB 9.5 ~2.1% 11 dev ~6.8%
High CPU consumption with version 10 and 10.1 (containerd: 6-7% all time). Downgraded to 9.5.
Add me to the list of affected users. Downgraded to 9.5 and cpu usage went back to normal. Also running on pve 7.4-3
This issue doesn't seem to be getting much traction. How can we help? (as competent sysadmin/python devs, but not especially familiar with HAOS)
In a quick attempt last week I monitored containerd using strace to see if there are significant amount of syscalls going on. If that would be the case, it could also be Linux kernel issue. However, I didn't see any syscalls over longer period of time, while there is CPU usage. So I am assuming that the problem is within Docker (or rather containerd) itself.
So the problem most likely is related to the Docker 23.0 upgrade. To isolate that, it might make sense to build an image using HAOS 9.5, but just upgrade Docker to 23.0, to verify that the problem indeed is related to Docker 23.0. If that is proven, opening an issue in the Docker GitHub project (moby) is probably the next step.
However, in my experience, just creating an issue is unlikely to trigger a quick fix. To get it resolved, likely we'd have to track down the actual issue ourselfs. Since it is the containerd
process which is causing higher CPU load, we'd have to dig into why it uses more CPU. One option is using some kind of profiling. I am not familiar with Go profiling, but I am sure there are ways to profile a go process to figure out which operations use (more) CPU than they did before.
During the hole process it could also be that feature x of containerd
just requires more CPU, and this was expected, in which case the whole endeavor was useless.
I am expecting this to take a significant amount of work to tackle. Since this "only" affects CPU usage, it isn't high on my priority list right now. I also have some hope that some stable package update (e.g. a new Docker patch release) suddenly resolves the issue :crossed_fingers:
An old containerd issue provides some hints about debugging/profiling containerd/docker: https://github.com/fnproject/fn/issues/700
I'm running on a Proxmox server and after upgrading from 9.x to 10.1 i have the same problem;
CPU went from 6% to 15%. Supervisor in both cases is 2023.04.1. I dont use the motionEye add-on.
Running 4 HA on 3 different hardware seeing the same issue. RPI4B rev1 for stanby. RPI3B for test, RPI3B in different home with different config and Intel NUC on proxmox. All 4 of them seeing same CPU increase behavior. The issue disappear by going back to 9.5. Like other said, seems containerd is the culprit. No motion eye. All 4 system has different addons. The only common addons in terminal.
I found this interesting and figured I'll try to replicate this. I used a basic debian 11 VM, cloned it and installed docker/containerd 23.0.6
/1.6.21
(latest at this time) on one and 20.10.22
/1.6.8
(What HAOS 9.5 uses) via the new --version
parameter of docker's installation script.
I used apt install containerd.io=1.6.8-1
to downgrade containerd
and rebooted.
I then used the docker run command from the docs and let the VMs idle for a bit. I didn't complete the onboarding, just visited to check if it was up.
I used proxmox's metric server
output to gather the stats. You can also see a snapshot of it here.
Maybe my testing is too shallow and flawed but I can't reproduce this behavior outside of HAOS. At least it doesn't seem to be a general docker/containerd issue. I can't really find any recent cpu related complaints in the moby/docker/containerd repos either.
Having now measured my power consumption - I can confirm increased wattage from HAOS 9.5 to 10.1 of more than 2 watts. (10-11w on 9.5, 13w on 10.1). I know 2w isn't that much (approx. US$2.88 per year locally), but what concerns me more is the added "wear" on hardware of more than 18% in energy. I hope we get a solution, other than going back to "old" system!
NOTE: I believe I saw a further rise in CPU usage after updating recently to latest HA core 2023.5.2. (Above figures are after this update).
It may be worth trying a cross-test (if possible):
But I don't know how to proceed.
@Impact123 these are very interesting findings! I guess this could mean two things:
a) Latest docker/containerd 23.0.6
/1.6.21
fixes the problem
b) The docker/containerd build config in HAOS is different
c) It is related to some other environment
I'd say c) is most likely, and there are lots of candidates: Go version used, glibc version or the Linux kernel are probable candidates. To rule out a) I can bump to this latest version relatively easy in HAOS 11 dev builds, I'll tackle that today. Then we can test if that fixes the problem. If not, we know it must be caused by something else than containerd itself.
@elmr91 building HAOS from scratch is documented on the developer website. Of course it requires manually adjusting Docker/containerd package versions. I'll tackle that at one point, but currently the Bluetooth issues have higher priority.
@Impact123, could you also make the same test on your VM with the docker/containerd version used by HAOS10 ? (it seems you compared docker version from HAOS 9.5 to latest) I think you should not see any difference (indicating the behavior we observe is linked to containerd environment)
NOTE: I believe I saw a further rise in CPU usage after updating recently to latest HA core 2023.5.2. (Above figures are after this update).
I know it might be off topic but I experienced the same. Now I have an extra 10% from 2023.5.2.
I didn't experience the same with HAOS 9.5 + 2023.5.2 (no noticeable CPU difference compared to last month - I also added few devices during this time)
@elmr91 Here's the same of just HA idling on the onboarding page but with 23.0.6
/1.6.21
and 23.0.3
/1.6.20
.
root@Debian:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
20da05ac337d ghcr.io/home-assistant/home-assistant:stable "/init" About an hour ago Up About an hour homeassistant
root@Debiantemp2:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
44de036f8143 ghcr.io/home-assistant/home-assistant:stable "/init" About an hour ago Up About an hour homeassistant
root@Debian:~# hostname; docker -v; containerd -v
Debian
Docker version 23.0.6, build ef23cbc
containerd containerd.io 1.6.21 3dce8eb055cbb6872793272b4f20ed16117344f8
root@Debiantemp2:~# hostname; docker -v; containerd -v
Debiantemp2
Docker version 23.0.3, build 3e7cbfd
containerd containerd.io 1.6.20 2806fc1057397dbaeefbea0e4e17bddfbd388f38
For reference
# ha os info | grep version:; docker -v; containerd -v
version: "10.0"
Docker version 23.0.3, build 23.0.3
containerd github.com/containerd/containerd 1.6.20
If I find some time I'll check if I'm able to grasp how to build HAOS with specific versions.
I'll add that I also noticed a significant increase in cpu usage from ~3% to ~9% when I moved from HAOS 9.5 to 10.0 or 10.1. Running on Proxmox 7.3-6 (i7-6700T). It definitely appears to be an issue with HAOS 10.0/10.1. And a few watts of increase do add up when you are paying $0.82 per kWh summer peak (Thanks SDG&E).
The latest dev builds use Docker 23.0.6 along with updated runc/containerd components. I don't expect it to change anything, but maybe worth a try: https://os-builds.home-assistant.io/11.0.dev20230509/
Here are the 9.5
, 10.1
and 11.0-dev20230509
VMs idling. These are completely fresh installs created/imported from the ova without any OS modifications. I didn't even visit the web interface this time.
The dip comes from activity on another VM
Have another interesting observation to report:
Since updating my Proxmox VE a number of days ago, which included a kernel-update, I have noted an average temp DECREASE of about 2 degrees Celsius. Good news! I am currently only running an HAOS 10.1 VM.
Can anyone else confirm something like this?
This is my current Proxmox VE details:
Kernel Version Linux 5.15.107-2-pve #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z)
PVE Manager Version pve-manager/7.4-3/9002ab8a
I've Intel(R) Atom(TM) x5-Z8500 CPU @ 1.44GHz system and I see approx 2 watts more power usage after upgrading from 9.5 to 10.1, here is the screen shot of my power meter readings;
Anyone tested OS 10.2?
Going from 10.1 to 10.2 on my Proxmox VE setup, memory usage plummeted (4.5 GB to 1.6 GB) but CPU usage increased another ~36% (2.8% to 3.8%). This is of course on top of the +45% I had on 9.5->10.0 (2.4% cpu average to 3.5% cpu average), which means CPU usage has basically doubled. (1.36*1.45 = 197% of original CPU load).
The white line in the first image is when the upgrade ran from 10.1 to 10.2.
Bump. This is still an issue, and doesn't seem to be getting any attention from HAOS devs. How can we help?
Low power consumption should always be top priority. Is this issue related to only few HW configurations? I use PN41 with N6000 (passively cooled), definitely want to keep resources available for other proccesses and HA efficiency is very important. Has anyone noticed a similar increase with their RPi?
Has anyone noticed a similar increase with their RPi?
Not with RPi but with HA in VirtualBox on a Linux/Debian host. Still on 9.5 since 10.0 was more or less unusable due to high CPU usage on the host. Went from single digit CPU to around 30 % CPU.
Can we assist in any way to troubleshoot this issue?
Not with RPi but with HA in VirtualBox on a Linux/Debian host. Still on 9.5 since 10.0 was more or less unusable due to high CPU usage on the host. Went from single digit CPU to around 30 % CPU.
So this has to do something with virtualization? For native installation no CPU increase?
Not with RPi but with HA in VirtualBox on a Linux/Debian host. Still on 9.5 since 10.0 was more or less unusable due to high CPU usage on the host. Went from single digit CPU to around 30 % CPU.
So this has to do something with virtualization? For native installation no CPU increase?
Nope. It's affecting native rpi 3B and 4B.
FWIW I reproduced the experiments from @Impact123 https://github.com/home-assistant/operating-system/issues/2476#issuecomment-1542574092 and had exactly the same results. Even with brand-new installs, at the onboarding screen on each:
9.5
10.2
dev
FWIW 9.5 is kernel 5.15.90, while 10.0 is 6.1.24 (and current dev HAOS is 6.1.32)
Not with RPi but with HA in VirtualBox on a Linux/Debian host. Still on 9.5 since 10.0 was more or less unusable due to high CPU usage on the host. Went from single digit CPU to around 30 % CPU.
So this has to do something with virtualization? For native installation no CPU increase?
No, it's affecting everything. It's just easier to show results with virtualized installs because they give you nice statistics. 'Containerd' inside HAOS is for running the internal service containers that are part of HAOS (most obviously the add-ons, like esphome), whether you're on bare metal or on a virtualized install.
Is there any way we can help with this? 9.5 is getting old and I cant use the network storage feature 🫠
Upcoming 10.3 does again not make any difference for the increased CPU usage. Think we need to live with it :-(
Nope, that should not be the case. It needs to be resolved somehow. From my experiences I can tell that lifting the kernel version of Ubuntu from 5.15 also worsened idle performance leading to +5°C, but at least I was able to resolve this by masking a specific gpe interrupt parameter and the kworker load finally went back down from 5% to 0% in idle. Is there a way to run the latest HAOS version with the old kernel?
Upcoming 10.3 does again not make any difference for the increased CPU usage. Think we need to live with it :-(
For me 10.3 also didn't solve it.
It may be an incompatibility between kernel build option in 10.x and what docker is relying on.
It is possible this kernel facility was compiled in kernel 5.15 (in HAOS 9x) but missing in new kernel tree (HAOS 10.x)
docker may use another mechanism (eating more CPU) to workaround this missing kernel/libc feature.
Only a guess...
I've spun up a Debian 11 VM and installed HA Supervised, and the CPU usage is the same as with OS 9.5. So no increase in CPU usage like in OS 10. Linux 5.10.0-23-amd64 x86_64 Docker version 24.0.2, build cb74dfc containerd containerd.io 1.6.21 3dce8eb055cbb6872793272b4f20ed16117344f8 Everything was installed following the instructions on https://github.com/home-assistant/supervised-installer
10.3 definitely doesn't fix it! I also just updated Proxmox to version 8, and this also didn't fix it. In fact it may be even worse! (~ +1°C warmer).
Interestingly, in contrast to other user reports here, I can't see any CPU increase on a fresh installation on a RPI4-64. I started with a new 9.5 installation and then updated to 10.3 (red line)...
I'll restore a backup with all integrations I use and see if this increases the CPU usage that much
10.3 definitely doesn't fix it! I also just updated Proxmox to version 8, and this also didn't fix it. In fact it may be even worse! (~ +1°C warmer).
I also expirienced this. However, it's a different issue. Before updating to OS 10 y moved from VE ~7.1 to 7.4 and saw a huge decrease on power consumption. Then OS 10 and saw improvement vanish with containerd problem. Now, updating to VE 8 it increased in the same proportion than when it decreased from 7.1 to 7.4. I guess whatever efficiency 7.4 brought was lost with 8.
Off topic but what I observe is that proxmox reports higher cpu usage on the vm than what HA says, i suppose that extra cpu comes from virtualization inneficiencies. Something related to the kernel or qemu in proxmox just makes me get extra 20% cpu usage on HA VM.
Both issues are driving me crazy, both accounting for up to 2W (+40%) power consumption. I guess I will start moving my addons to my docker LXC and eventually move to HA container.
For others experiencing cpu increase, try to consider that VE 8 might play a role and use that knowledge to measure the drop in performance only for OS >10. With VE 8 for me proxmox reported a cpu increase and remained, but HA cpu increased (inside) for a few days and regularized.
In the meantime i updated to 10.3 in an isolated network (means all integrations/addons active but no communication to network devices) and everything looks normal so far. No CPU increase, running pretty stable at ~4% as before the update.
I am confident enough to update my productive Pi4 as well 👍
Staying on HAOS 9.5 in proxmox VM, I upgraded host from Proxmox 7.4 to 8.0 I also noticed (as @dsolva in comment) a power CPU/consumption increase from a constant 7W to 8-9W with frequent spikes to 20W (several per minute)
CPU spikes were not related to HAOS VM but proxmox host itself. I managed to tune proxmox VE 8.0 kernel to lower CPU and 5-6W power consumption (using powertop / disabling power tuning on GPU causing spikes)
This not directly related to containerd problem on HAOS 10. But it shows how a kernel change may directly impact CPU/poer consumption.
Next step is to retry upgrade on VE 8 from HAOS 9.5 to 10.3 and monitor CPU/power consumption.
I am also seeing the same issue, not sure when it started but CPU usage went significantly up. I thought it might have something to do with the upgrade of Ubuntu or the VMware player, but after reading this, docker looks to be the culprit.
On a sidenote, why not just use pod man? Would that not use less overhead and be quicker as well ?
I have the same problem with HAOS 10.x on a proxmox 7.4 virtual machine. With HAOS 9.5 I have an average CPU usage of 2-2.5%, while if I upgrade to HAOS 10.3 for example, the CPU usage goes up to 5-6%. It seems that none of the HAOS 10.x updates are taking into account this problem that we have been having for some months now. I would like to upgrade to HAOS 10.x someday.
Doesn't seem to be a general but more a hardware/virtualization specific issue. I read a lot of "Proxmox" and "VM". I don't have any issue on two Raspberry Pi 4 with 10.3 since the update a few days back (this thread here initially made me skeptically and let me wait that long before I finally upgraded).
The issue was definitely noticeable on Yellow/CM4 as well.
Describe the issue you are experiencing
I have juste upgraded my proxmox HAOS VM to OS 10 I immediately noticed CPU usage raising from around 2% to 10% after upgrade.
"docker stats" shows a normal container usage / nearly no load.
"top" shows containerd is using a consistent 6-8% CPU (this is the only process using significant CPU load)
I rebooted the VM, but CPU load stays the same:
What operating system image do you use?
ova (for Virtual Machines)
What version of Home Assistant Operating System is installed?
10
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
1.Install 9.5 ova image in proxmox 2.Upgrade to Operating System 10 3. ...
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
`## System Information
Home Assistant Community Store
GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4975 Installed Version | 1.32.1 Stage | running Available Repositories | 1274 Downloaded Repositories | 3Home Assistant Cloud
logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | okHome Assistant Supervisor
host_os | Home Assistant OS 10.0 -- | -- update_channel | stable supervisor_version | supervisor-2023.04.0 agent_version | 1.5.1 docker_version | 23.0.3 disk_total | 30.8 GB disk_used | 3.9 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.6.1), File editor (5.5.0)Dashboards
dashboards | 2 -- | -- resources | 1 views | 5 mode | storageRecorder
oldest_recorder_run | 12 avril 2023 à 19:39 -- | -- current_recorder_run | 18 avril 2023 à 18:22 estimated_db_size | 179.41 MiB database_engine | sqlite database_version | 3.38.5 [