home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.8k stars 959 forks source link

CPU usage bumped from 2% to 10% after Operating System 10.0 upgrade (caused by containerd) #2476

Closed elmr91 closed 11 months ago

elmr91 commented 1 year ago

Describe the issue you are experiencing

I have juste upgraded my proxmox HAOS VM to OS 10 I immediately noticed CPU usage raising from around 2% to 10% after upgrade.

"docker stats" shows a normal container usage / nearly no load.

CONTAINER ID   NAME                      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
4448c00d8d7e   addon_core_configurator   0.03%     30.43MiB / 1.925GiB   1.54%     892kB / 843B      23.9MB / 307kB   8
01cb0b8343ce   addon_core_ssh            0.00%     27.51MiB / 1.925GiB   1.40%     943kB / 31.7kB    25.4MB / 393kB   12
7920e2a0c17a   hassio_multicast          0.20%     848KiB / 1.925GiB     0.04%     0B / 0B           61.4kB / 102kB   4
f7aefac69874   hassio_audio              0.00%     23.89MiB / 1.925GiB   1.21%     911kB / 0B        21.3MB / 315kB   13
1d6b8aefd115   hassio_dns                0.00%     29.41MiB / 1.925GiB   1.49%     935kB / 26.7kB    24.8MB / 106kB   11
a4872c2f1be6   hassio_cli                0.00%     13.92MiB / 1.925GiB   0.71%     917kB / 4.07kB    13.2MB / 283kB   9
04f375c2f3cd   hassio_supervisor         0.00%     106.5MiB / 1.925GiB   5.40%     1.31MB / 1.07MB   61.5MB / 958kB   24
13b89612a676   homeassistant             0.80%     354.9MiB / 1.925GiB   18.00%    0B / 0B           203MB / 116MB    33
47dda87b99d2   hassio_observer           0.00%     13.68MiB / 1.925GiB   0.69%     927kB / 9.84kB    12MB / 106kB     8 

"top" shows containerd is using a consistent 6-8% CPU (this is the only process using significant CPU load)

   349 root      20   0 1394.2m  43.1m   7.2   2.2   5:04.68 S  `- /usr/bin/containerd

I rebooted the VM, but CPU load stays the same:
2023-04-18 18_12_20-Clipboard

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

10

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

1.Install 9.5 ova image in proxmox 2.Upgrade to Operating System 10 3. ...

Anything in the Supervisor logs that might be useful for us?

23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.hostname] Load dbus interface org.freedesktop.hostname1
23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.logind] Load dbus interface org.freedesktop.login1
23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.network] Load dbus interface org.freedesktop.NetworkManager
23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.rauc] Load dbus interface de.pengutronix.rauc
23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.resolved] Load dbus interface org.freedesktop.resolve1
23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.systemd] Load dbus interface org.freedesktop.systemd1
23-04-18 18:22:24 INFO (MainThread) [supervisor.dbus.timedate] Load dbus interface org.freedesktop.timedate1
23-04-18 18:22:24 INFO (MainThread) [supervisor.host.services] Updating service information
23-04-18 18:22:24 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
23-04-18 18:22:24 INFO (MainThread) [supervisor.host.network] Updating local network information
23-04-18 18:22:24 INFO (MainThread) [supervisor.host.apparmor] Loading AppArmor Profiles: {'hassio-supervisor'}
23-04-18 18:22:24 INFO (MainThread) [supervisor.docker.monitor] Started docker events monitor
23-04-18 18:22:24 INFO (SyncWorker_1) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/amd64-hassio-cli with version 2022.11.0
23-04-18 18:22:24 INFO (MainThread) [supervisor.plugins.cli] Starting CLI plugin
23-04-18 18:22:24 INFO (SyncWorker_1) [supervisor.docker.interface] Cleaning hassio_cli application
23-04-18 18:22:25 INFO (SyncWorker_1) [supervisor.docker.cli] Starting CLI ghcr.io/home-assistant/amd64-hassio-cli with version 2022.11.0 - 172.30.32.5
23-04-18 18:22:25 INFO (SyncWorker_0) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/amd64-hassio-dns with version 2022.04.1
23-04-18 18:22:25 INFO (MainThread) [supervisor.plugins.dns] Starting CoreDNS plugin
23-04-18 18:22:25 INFO (SyncWorker_0) [supervisor.docker.interface] Cleaning hassio_dns application
23-04-18 18:22:25 INFO (SyncWorker_0) [supervisor.docker.dns] Starting DNS ghcr.io/home-assistant/amd64-hassio-dns with version 2022.04.1 - 172.30.32.3
23-04-18 18:22:25 INFO (MainThread) [supervisor.plugins.dns] Updated /etc/resolv.conf
23-04-18 18:22:25 INFO (SyncWorker_1) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/amd64-hassio-audio with version 2022.07.0
23-04-18 18:22:25 INFO (MainThread) [supervisor.plugins.audio] Starting Audio plugin
23-04-18 18:22:25 INFO (SyncWorker_1) [supervisor.docker.interface] Cleaning hassio_audio application
23-04-18 18:22:26 INFO (SyncWorker_1) [supervisor.docker.audio] Starting Audio ghcr.io/home-assistant/amd64-hassio-audio with version 2022.07.0 - 172.30.32.4
23-04-18 18:22:26 INFO (SyncWorker_0) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/amd64-hassio-observer with version 2021.10.0
23-04-18 18:22:26 INFO (SyncWorker_1) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/amd64-hassio-multicast with version 2022.02.0
23-04-18 18:22:26 INFO (MainThread) [supervisor.plugins.multicast] Starting Multicast plugin
23-04-18 18:22:26 INFO (SyncWorker_1) [supervisor.docker.interface] Cleaning hassio_multicast application
23-04-18 18:22:26 INFO (SyncWorker_1) [supervisor.docker.multicast] Starting Multicast ghcr.io/home-assistant/amd64-hassio-multicast with version 2022.02.0 - Host
23-04-18 18:22:26 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
23-04-18 18:22:26 INFO (MainThread) [supervisor.homeassistant.secrets] Loaded 1 Home Assistant secrets
23-04-18 18:22:26 INFO (SyncWorker_0) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/qemux86-64-homeassistant with version 2023.4.5
23-04-18 18:22:26 INFO (MainThread) [supervisor.os.manager] Detect Home Assistant Operating System 10.0 / BootSlot B
23-04-18 18:22:27 INFO (MainThread) [supervisor.store.git] Loading add-on /data/addons/git/5c53de3b repository
23-04-18 18:22:27 INFO (MainThread) [supervisor.store.git] Loading add-on /data/addons/core repository
23-04-18 18:22:27 INFO (MainThread) [supervisor.store.git] Loading add-on /data/addons/git/a0d7b954 repository
23-04-18 18:22:27 INFO (MainThread) [supervisor.store] Loading add-ons from store: 69 all - 69 new - 0 remove
23-04-18 18:22:27 INFO (MainThread) [supervisor.addons] Found 2 installed add-ons
23-04-18 18:22:27 INFO (SyncWorker_1) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-ssh with version 9.6.1
23-04-18 18:22:27 INFO (SyncWorker_2) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-configurator with version 5.5.0
23-04-18 18:22:27 INFO (MainThread) [supervisor.backups.manager] Found 5 backup files
23-04-18 18:22:27 INFO (MainThread) [supervisor.discovery] Loaded 0 messages
23-04-18 18:22:27 INFO (MainThread) [supervisor.ingress] Loaded 0 ingress sessions
23-04-18 18:22:27 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.SETUP
23-04-18 18:22:27 INFO (MainThread) [supervisor.resolution.check] System checks complete
23-04-18 18:22:27 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.SETUP
23-04-18 18:22:27 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
23-04-18 18:22:27 INFO (MainThread) [supervisor.jobs] 'ResolutionFixup.run_autofix' blocked from execution, system is not running - CoreState.SETUP
23-04-18 18:22:27 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.SETUP
23-04-18 18:22:27 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
23-04-18 18:22:27 INFO (MainThread) [__main__] Running Supervisor
23-04-18 18:22:27 INFO (MainThread) [supervisor.os.manager] Rauc: B - marked slot kernel.1 as good
23-04-18 18:22:27 INFO (MainThread) [supervisor.addons] Phase 'AddonStartup.INITIALIZE' starting 0 add-ons
23-04-18 18:22:27 INFO (MainThread) [supervisor.addons] Phase 'AddonStartup.SYSTEM' starting 0 add-ons
23-04-18 18:22:27 INFO (MainThread) [supervisor.addons] Phase 'AddonStartup.SERVICES' starting 1 add-ons
23-04-18 18:22:27 INFO (SyncWorker_2) [supervisor.docker.interface] Cleaning addon_core_ssh application
23-04-18 18:22:28 INFO (SyncWorker_2) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-ssh with version 9.6.1
23-04-18 18:22:33 INFO (MainThread) [supervisor.core] Start Home Assistant Core
23-04-18 18:22:33 INFO (SyncWorker_3) [supervisor.docker.interface] Starting homeassistant
23-04-18 18:22:33 INFO (MainThread) [supervisor.homeassistant.core] Wait until Home Assistant is ready
23-04-18 18:22:36 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.STARTUP
23-04-18 18:22:36 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
23-04-18 18:22:37 INFO (MainThread) [supervisor.store.git] Update add-on https://github.com/home-assistant/addons repository
23-04-18 18:22:37 INFO (MainThread) [supervisor.store.git] Update add-on https://github.com/hassio-addons/repository repository
23-04-18 18:22:37 INFO (MainThread) [supervisor.store.git] Update add-on https://github.com/esphome/home-assistant-addon repository
23-04-18 18:22:38 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
23-04-18 18:22:38 INFO (MainThread) [supervisor.store] Loading add-ons from store: 69 all - 0 new - 0 remove
23-04-18 18:22:38 INFO (MainThread) [supervisor.store] Loading add-ons from store: 69 all - 0 new - 0 remove
23-04-18 18:23:09 INFO (MainThread) [supervisor.homeassistant.core] Detect a running Home Assistant instance
23-04-18 18:23:09 INFO (MainThread) [supervisor.addons] Phase 'AddonStartup.APPLICATION' starting 1 add-ons
23-04-18 18:23:09 INFO (SyncWorker_0) [supervisor.docker.interface] Cleaning addon_core_configurator application
23-04-18 18:23:09 INFO (SyncWorker_0) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-configurator with version 5.5.0
23-04-18 18:23:14 INFO (MainThread) [supervisor.misc.tasks] All core tasks are scheduled
23-04-18 18:23:14 INFO (MainThread) [supervisor.core] Supervisor is up and running
23-04-18 18:23:14 INFO (MainThread) [supervisor.host.info] Updating local host information
23-04-18 18:23:14 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.DNS_SERVER_IPV6_ERROR/ContextType.DNS_SERVER
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.DNS_SERVER_FAILED/ContextType.DNS_SERVER
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.MULTIPLE_DATA_DISKS/ContextType.SYSTEM
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.TRUST/ContextType.SUPERVISOR
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.NO_CURRENT_BACKUP/ContextType.SYSTEM
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.module] Create new suggestion SuggestionType.CREATE_FULL_BACKUP - ContextType.SYSTEM / None
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.module] Create new issue IssueType.NO_CURRENT_BACKUP - ContextType.SYSTEM / None
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.IPV4_CONNECTION_PROBLEM/ContextType.SYSTEM
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.check] System checks complete
23-04-18 18:23:14 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
23-04-18 18:23:15 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
23-04-18 18:23:15 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
23-04-18 18:23:15 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
23-04-18 18:23:15 INFO (MainThread) [supervisor.host.services] Updating service information
23-04-18 18:23:15 INFO (MainThread) [supervisor.host.network] Updating local network information
23-04-18 18:23:15 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
23-04-18 18:23:15 INFO (MainThread) [supervisor.host.manager] Host information reload completed

Anything in the Host logs that might be useful for us?

Apr 18 16:22:27 homeassistant bluetoothd[2232]: Bluetooth daemon 5.66
Apr 18 16:22:27 homeassistant systemd[1]: Started Bluetooth service.
Apr 18 16:22:27 homeassistant bluetoothd[2232]: Starting SDP server
Apr 18 16:22:27 homeassistant kernel: Bluetooth: BNEP (Ethernet Emulation) ver 1.3
Apr 18 16:22:27 homeassistant kernel: Bluetooth: BNEP filters: protocol multicast
Apr 18 16:22:27 homeassistant kernel: Bluetooth: BNEP socket layer initialized
Apr 18 16:22:27 homeassistant bluetoothd[2232]: Bluetooth management interface 1.22 initialized
Apr 18 16:22:27 homeassistant os-agent[104]: INFO: 2023/04/18 16:22:27 main.go:94: Diagnostics is now true
Apr 18 16:22:27 homeassistant systemd[1]: var-lib-docker-overlay2-2eaf0ad15fa660ee1b2604082b0b49856c12481a72f486c60fa2a9d809f4a7ed\x2dinit-merged.mount: Deactivated successfully.
Apr 18 16:22:27 homeassistant systemd[1]: mnt-data-docker-overlay2-2eaf0ad15fa660ee1b2604082b0b49856c12481a72f486c60fa2a9d809f4a7ed\x2dinit-merged.mount: Deactivated successfully.
Apr 18 16:22:27 homeassistant systemd[1]: var-lib-docker-overlay2-2eaf0ad15fa660ee1b2604082b0b49856c12481a72f486c60fa2a9d809f4a7ed-merged.mount: Deactivated successfully.
Apr 18 16:22:27 homeassistant systemd[1]: mnt-data-docker-overlay2-2eaf0ad15fa660ee1b2604082b0b49856c12481a72f486c60fa2a9d809f4a7ed-merged.mount: Deactivated successfully.
Apr 18 16:22:27 homeassistant kernel: hassio: port 6(vethf8c4077) entered blocking state
Apr 18 16:22:27 homeassistant kernel: hassio: port 6(vethf8c4077) entered disabled state
Apr 18 16:22:27 homeassistant kernel: device vethf8c4077 entered promiscuous mode
Apr 18 16:22:27 homeassistant NetworkManager[319]: <info>  [1681834947.8072] manager: (vethba72fe7): new Veth device (/org/freedesktop/NetworkManager/Devices/17)
Apr 18 16:22:27 homeassistant NetworkManager[319]: <info>  [1681834947.8085] manager: (vethf8c4077): new Veth device (/org/freedesktop/NetworkManager/Devices/18)
Apr 18 16:22:27 homeassistant systemd[1]: Started libcontainer container 585c29bcc7be293d23741540dbb4cc2af1bd3a22b3685bc4f02f04248e9b9f09.
Apr 18 16:22:28 homeassistant kernel: eth0: renamed from vethba72fe7
Apr 18 16:22:28 homeassistant kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethf8c4077: link becomes ready
Apr 18 16:22:28 homeassistant kernel: hassio: port 6(vethf8c4077) entered blocking state
Apr 18 16:22:28 homeassistant kernel: hassio: port 6(vethf8c4077) entered forwarding state
Apr 18 16:22:28 homeassistant NetworkManager[319]: <info>  [1681834948.0849] device (vethf8c4077): carrier: link connected
Apr 18 16:22:33 homeassistant systemd[1]: run-docker-runtime\x2drunc-moby-13b89612a676908649a9c53efe151781e6267309bfde99934415abd8ec969c83-runc.mHMj5m.mount: Deactivated successfully.
Apr 18 16:22:33 homeassistant systemd[1]: Started libcontainer container 13b89612a676908649a9c53efe151781e6267309bfde99934415abd8ec969c83.
Apr 18 16:22:33 homeassistant kernel: kauditd_printk_skb: 128 callbacks suppressed
Apr 18 16:22:33 homeassistant kernel: audit: type=1334 audit(1681834953.354:193): prog-id=49 op=LOAD
Apr 18 16:22:33 homeassistant kernel: audit: type=1300 audit(1681834953.354:193): arch=c000003e syscall=321 success=yes exit=15 a0=5 a1=c00018d7f8 a2=78 a3=0 items=0 ppid=2520 pid=2531 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc" exe="/usr/bin/runc" subj=unconfined key=(null)
Apr 18 16:22:33 homeassistant kernel: audit: type=1327 audit(1681834953.354:193): proctitle=72756E63002D2D726F6F74002F7661722F72756E2F646F636B65722F72756E74696D652D72756E632F6D6F6279002D2D6C6F67002F72756E2F636F6E7461696E6572642F696F2E636F6E7461696E6572642E72756E74696D652E76322E7461736B2F6D6F62792F31336238393631326136373639303836343961396335336566
Apr 18 16:22:33 homeassistant kernel: audit: type=1334 audit(1681834953.355:194): prog-id=50 op=LOAD
Apr 18 16:22:33 homeassistant kernel: audit: type=1300 audit(1681834953.355:194): arch=c000003e syscall=321 success=yes exit=17 a0=5 a1=c00018d590 a2=78 a3=0 items=0 ppid=2520 pid=2531 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc" exe="/usr/bin/runc" subj=unconfined key=(null)
Apr 18 16:22:33 homeassistant kernel: audit: type=1327 audit(1681834953.355:194): proctitle=72756E63002D2D726F6F74002F7661722F72756E2F646F636B65722F72756E74696D652D72756E632F6D6F6279002D2D6C6F67002F72756E2F636F6E7461696E6572642F696F2E636F6E7461696E6572642E72756E74696D652E76322E7461736B2F6D6F62792F31336238393631326136373639303836343961396335336566
Apr 18 16:22:33 homeassistant kernel: audit: type=1334 audit(1681834953.355:195): prog-id=50 op=UNLOAD
Apr 18 16:22:33 homeassistant kernel: audit: type=1334 audit(1681834953.356:196): prog-id=49 op=UNLOAD
Apr 18 16:22:33 homeassistant kernel: audit: type=1334 audit(1681834953.356:197): prog-id=51 op=LOAD
Apr 18 16:22:33 homeassistant kernel: audit: type=1300 audit(1681834953.356:197): arch=c000003e syscall=321 success=yes exit=15 a0=5 a1=c00018da50 a2=78 a3=0 items=0 ppid=2520 pid=2531 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc" exe="/usr/bin/runc" subj=unconfined key=(null)
Apr 18 16:22:37 homeassistant systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Apr 18 16:22:54 homeassistant systemd[1]: systemd-timedated.service: Deactivated successfully.
Apr 18 16:22:54 homeassistant kernel: kauditd_printk_skb: 1 callbacks suppressed
Apr 18 16:22:54 homeassistant kernel: audit: type=1334 audit(1681834974.177:198): prog-id=26 op=UNLOAD
Apr 18 16:22:54 homeassistant kernel: audit: type=1334 audit(1681834974.177:199): prog-id=25 op=UNLOAD
Apr 18 16:22:54 homeassistant kernel: audit: type=1334 audit(1681834974.177:200): prog-id=24 op=UNLOAD
Apr 18 16:22:57 homeassistant systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Apr 18 16:22:57 homeassistant kernel: audit: type=1334 audit(1681834977.264:201): prog-id=14 op=UNLOAD
Apr 18 16:22:57 homeassistant kernel: audit: type=1334 audit(1681834977.264:202): prog-id=13 op=UNLOAD
Apr 18 16:22:57 homeassistant kernel: audit: type=1334 audit(1681834977.264:203): prog-id=12 op=UNLOAD
Apr 18 16:23:09 homeassistant systemd[1]: var-lib-docker-overlay2-7d815f3ad31630b9e27b12657d4e783170bec95b305873ed51056f3844933e1f\x2dinit-merged.mount: Deactivated successfully.
Apr 18 16:23:09 homeassistant systemd[1]: mnt-data-docker-overlay2-7d815f3ad31630b9e27b12657d4e783170bec95b305873ed51056f3844933e1f\x2dinit-merged.mount: Deactivated successfully.
Apr 18 16:23:09 homeassistant systemd[1]: var-lib-docker-overlay2-7d815f3ad31630b9e27b12657d4e783170bec95b305873ed51056f3844933e1f-merged.mount: Deactivated successfully.
Apr 18 16:23:09 homeassistant systemd[1]: mnt-data-docker-overlay2-7d815f3ad31630b9e27b12657d4e783170bec95b305873ed51056f3844933e1f-merged.mount: Deactivated successfully.
Apr 18 16:23:09 homeassistant NetworkManager[319]: <info>  [1681834989.5172] manager: (vethaf7f565): new Veth device (/org/freedesktop/NetworkManager/Devices/19)
Apr 18 16:23:09 homeassistant kernel: hassio: port 7(vethe13a13f) entered blocking state
Apr 18 16:23:09 homeassistant kernel: hassio: port 7(vethe13a13f) entered disabled state
Apr 18 16:23:09 homeassistant kernel: device vethe13a13f entered promiscuous mode
Apr 18 16:23:09 homeassistant kernel: audit: type=1700 audit(1681834989.517:204): dev=vethe13a13f prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Apr 18 16:23:09 homeassistant kernel: audit: type=1300 audit(1681834989.517:204): arch=c000003e syscall=44 success=yes exit=40 a0=c a1=c000a38690 a2=28 a3=0 items=0 ppid=1 pid=397 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=unconfined key=(null)
Apr 18 16:23:09 homeassistant kernel: audit: type=1327 audit(1681834989.517:204): proctitle=2F7573722F62696E2F646F636B657264002D480066643A2F2F002D2D636F6E7461696E6572643D2F72756E2F636F6E7461696E6572642F636F6E7461696E6572642E736F636B
Apr 18 16:23:09 homeassistant NetworkManager[319]: <info>  [1681834989.5216] manager: (vethe13a13f): new Veth device (/org/freedesktop/NetworkManager/Devices/20)
Apr 18 16:23:09 homeassistant systemd[1]: Started libcontainer container 481308628ffcd074969046225f8a9435aea56255ad7b6d1eb690b2e3f270004e.
Apr 18 16:23:09 homeassistant kernel: audit: type=1334 audit(1681834989.625:205): prog-id=52 op=LOAD
Apr 18 16:23:09 homeassistant kernel: audit: type=1334 audit(1681834989.625:206): prog-id=53 op=LOAD
Apr 18 16:23:09 homeassistant kernel: audit: type=1300 audit(1681834989.625:206): arch=c000003e syscall=321 success=yes exit=16 a0=5 a1=c00018d7f8 a2=78 a3=0 items=0 ppid=2881 pid=2890 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc" exe="/usr/bin/runc" subj=unconfined key=(null)
Apr 18 16:23:09 homeassistant kernel: audit: type=1327 audit(1681834989.625:206): proctitle=72756E63002D2D726F6F74002F7661722F72756E2F646F636B65722F72756E74696D652D72756E632F6D6F6279002D2D6C6F67002F72756E2F636F6E7461696E6572642F696F2E636F6E7461696E6572642E72756E74696D652E76322E7461736B2F6D6F62792F34383133303836323866666364303734393639303436323235
Apr 18 16:23:09 homeassistant kernel: audit: type=1334 audit(1681834989.625:207): prog-id=54 op=LOAD
Apr 18 16:23:09 homeassistant kernel: audit: type=1300 audit(1681834989.625:207): arch=c000003e syscall=321 success=yes exit=18 a0=5 a1=c00018d590 a2=78 a3=0 items=0 ppid=2881 pid=2890 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc" exe="/usr/bin/runc" subj=unconfined key=(null)
Apr 18 16:23:09 homeassistant kernel: audit: type=1327 audit(1681834989.625:207): proctitle=72756E63002D2D726F6F74002F7661722F72756E2F646F636B65722F72756E74696D652D72756E632F6D6F6279002D2D6C6F67002F72756E2F636F6E7461696E6572642F696F2E636F6E7461696E6572642E72756E74696D652E76322E7461736B2F6D6F62792F34383133303836323866666364303734393639303436323235
Apr 18 16:23:09 homeassistant kernel: eth0: renamed from vethaf7f565
Apr 18 16:23:09 homeassistant kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe13a13f: link becomes ready
Apr 18 16:23:09 homeassistant kernel: hassio: port 7(vethe13a13f) entered blocking state
Apr 18 16:23:09 homeassistant kernel: hassio: port 7(vethe13a13f) entered forwarding state
Apr 18 16:23:09 homeassistant NetworkManager[319]: <info>  [1681834989.7906] device (vethe13a13f): carrier: link connected
Apr 18 16:23:14 homeassistant kernel: kauditd_printk_skb: 29 callbacks suppressed
Apr 18 16:23:14 homeassistant kernel: audit: type=1334 audit(1681834994.860:219): prog-id=56 op=LOAD
Apr 18 16:23:14 homeassistant kernel: audit: type=1334 audit(1681834994.861:220): prog-id=57 op=LOAD
Apr 18 16:23:14 homeassistant kernel: audit: type=1334 audit(1681834994.861:221): prog-id=58 op=LOAD
Apr 18 16:23:14 homeassistant systemd[1]: Starting Hostname Service...
Apr 18 16:23:14 homeassistant systemd[1]: Started Hostname Service.
Apr 18 16:23:15 homeassistant kernel: audit: type=1334 audit(1681834995.000:222): prog-id=59 op=LOAD
Apr 18 16:23:15 homeassistant kernel: audit: type=1334 audit(1681834995.001:223): prog-id=60 op=LOAD
Apr 18 16:23:15 homeassistant kernel: audit: type=1334 audit(1681834995.001:224): prog-id=61 op=LOAD
Apr 18 16:23:15 homeassistant systemd[1]: Starting Time & Date Service...
Apr 18 16:23:15 homeassistant systemd[1]: Started Time & Date Service.
Apr 18 16:23:45 homeassistant systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Apr 18 16:23:45 homeassistant kernel: audit: type=1334 audit(1681835025.038:225): prog-id=58 op=UNLOAD
Apr 18 16:23:45 homeassistant kernel: audit: type=1334 audit(1681835025.038:226): prog-id=57 op=UNLOAD
Apr 18 16:23:45 homeassistant kernel: audit: type=1334 audit(1681835025.038:227): prog-id=56 op=UNLOAD
Apr 18 16:23:45 homeassistant systemd[1]: systemd-timedated.service: Deactivated successfully.
Apr 18 16:23:45 homeassistant kernel: audit: type=1334 audit(1681835025.161:228): prog-id=61 op=UNLOAD
Apr 18 16:23:45 homeassistant kernel: audit: type=1334 audit(1681835025.161:229): prog-id=60 op=UNLOAD
Apr 18 16:23:45 homeassistant kernel: audit: type=1334 audit(1681835025.161:230): prog-id=59 op=UNLOAD
Apr 18 16:32:42 homeassistant dropbear[3091]: [3091] Apr 18 16:32:42 Child connection from 192.168.0.12:50710
Apr 18 16:32:42 homeassistant dropbear[3091]: [3091] Apr 18 16:32:42 Pubkey auth succeeded for 'root' with ssh-rsa key SHA256:5KG9InxtuSAC5RtRQ/pdqJU5u4duuGFcXq+KRToAnQc from 192.168.0.12:50710
Apr 18 16:37:22 homeassistant systemd[1]: Starting Cleanup of Temporary Directories...
Apr 18 16:37:22 homeassistant systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Apr 18 16:37:22 homeassistant systemd[1]: Finished Cleanup of Temporary Directories.
Apr 18 16:37:22 homeassistant systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Apr 18 16:37:38 homeassistant kernel: audit: type=1334 audit(1681835858.696:231): prog-id=62 op=LOAD
Apr 18 16:37:38 homeassistant systemd-timesyncd[1197]: Network configuration changed, trying to establish connection.
Apr 18 16:37:38 homeassistant systemd-timesyncd[1197]: Contacted time server 192.168.0.254:123 (192.168.0.254).
Apr 18 16:37:38 homeassistant systemd[1]: Started Journal Gateway Service.
Apr 18 16:37:38 homeassistant systemd-journal-gatewayd[3113]: microhttpd: MHD_OPTION_EXTERNAL_LOGGER is not the first option specified for the daemon. Some messages may be printed by the standard MHD logger.

System information

`## System Information

version core-2023.4.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.10
os_name Linux
os_version 6.1.24
arch x86_64
timezone Europe/Paris
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4975 Installed Version | 1.32.1 Stage | running Available Repositories | 1274 Downloaded Repositories | 3
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 10.0 -- | -- update_channel | stable supervisor_version | supervisor-2023.04.0 agent_version | 1.5.1 docker_version | 23.0.3 disk_total | 30.8 GB disk_used | 3.9 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.6.1), File editor (5.5.0)
Dashboards dashboards | 2 -- | -- resources | 1 views | 5 mode | storage
Recorder oldest_recorder_run | 12 avril 2023 à 19:39 -- | -- current_recorder_run | 18 avril 2023 à 18:22 estimated_db_size | 179.41 MiB database_engine | sqlite database_version | 3.38.5 [
`](url) ### Additional information _No response_
atv2016 commented 1 year ago

I think people are saying it's a docker issue? I'm going to try and run this in a older vmware player see if the problem goes away. I'm not happy at all with the current cpu being around 40-50 percent now compared to 5-10 normally.


From: Der Mundschenk & Compagnie @.> Sent: 13 July 2023 08:09 To: home-assistant/operating-system @.> Cc: atv2016 @.>; Comment @.> Subject: Re: [home-assistant/operating-system] CPU usage bumped from 2% to 10% after Operating System 10.0 upgrade (caused by containerd) (Issue #2476)

The issue was definitely noticeable on Yellow/CM4 as well.

— Reply to this email directly, view it on GitHubhttps://github.com/home-assistant/operating-system/issues/2476#issuecomment-1633687440, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEBCFU3UMO4MK5WDHFFM2RTXP6NK3ANCNFSM6AAAAAAXC4QCVI. You are receiving this because you commented.Message ID: @.***>

dsolva commented 1 year ago

Doesn't seem to be a general but more a hardware/virtualization specific issue. I read a lot of "Proxmox" and "VM". I don't have any issue on two Raspberry Pi 4 with 10.3 since the update a few days back (this thread here initially made me skeptically and let me wait that long before I finally upgraded).

Someone above said it was general but it was easier to notice with the stats provided from Proxmox for instance. The increase in cpu reported by HA was not that huge for me, but the one reported by proxmox is. Maybe the virtualization heightens the issue (upgrading proxmox changes my cpu consumption after 10.x) but i guess its still there. We also notice a huge increase in power consumption because of higher tdp cpus, a 1 o 2 % increase in cpu increases 0.5watts. For a Pi I guess there shouldnt be a huge increase.

djrm05 commented 1 year ago

I'm seeing the same behaviour but just after upgrade the OS in my Qnap NAS. Usually while the NAS is in idle I see around 2% CPU util, however now it is between 10 and 15%. Stopping the HA container, it back suddently to 2% and just restarting it it raise againg to a min. of 10% I have tried to recreate the container, stop all the integrations etc, but nothing helps.

gfn256 commented 1 year ago

Interesting observation: After some recent (a few days ago) Proxmox VE (8.03) updates, my temps seem to have gone back down!

aborrajo commented 1 year ago

Interesting observation: After some recent (a few days ago) Proxmox VE (8.03) updates, my temps seem to have gone back down!

As mentioned earlier, the improvement in your CPU temperature will be due to the update of Proxmox, but the issue with HAOS 10.0 remains the same regardless of whether it is a VM or not. In summary, we still have no solution from HomeAssistant's side...

elmr91 commented 1 year ago

Could someone running HAOS 10.x have a look on docker.service content ? (/usr/lib/systemd/system/docker.service) Please provide LimitNOFILE value.... It may be LimitNOFILE=1048576 or LimitNOFILE=infinity cf https://github.com/containerd/containerd/pull/7566 edit: fixed URL

I don't have access to my system to check...

jlunz commented 1 year ago

haos_rpi4-64-10.4.img.xz has:

LimitNOFILE=infinity

in /usr/lib/systemd/system/docker.service (on hassos-system0) In my current 9.5 it is set to:

LimitNOFILE=1048576
agners commented 1 year ago

See also the New limits for Supervisor Add-ons blog post on developer.home-assistant.io.

Typically, the issues around the limits appeared on startup. Often out-of-memory situation occurred, or startup took longer than necessary (since quite often an array got initialized depending on the limit set for open file descriptors). But it could as well be that this has run time implications for some add-ons...

@elmr91 do you have specific reason to believe that this is the culprit for the increased CPU usage?

polarathene commented 1 year ago

Saw this from upstream PR on containerd, I was involved in pushing the fix forward for containerd and moby (Docker). Until that lands in a future release (both required for Docker), you should have their .service files configured with: LimitNOFILE=1024:524288.


If you only provide a single value, you are setting both soft and hard limit to that value, you don't want that.. Some OS like Debian patch out the systemd v240 change to raise fs.nr_open from 1048576 (aka 2^20) to 2^30 (1,000x larger) which minimizes the impact on system resource usage when this bug is encountered. As that lower value seems reasonably effective, while not being understood well by the user, LimitNOFILE=1048576 is parroted around online as a fix, but is still bad advice. At the very least you should keep the soft limit at 1024 (an additional 1,000x reduced impact on system resources as well).

You rarely need a higher hard limit than 524288, it should meet the needs of most software and is easier to debug the failure when it is not. 1048576 isn't too bad, it's only twice as big and shouldn't affect most software when the soft limit is 1024.


When you don't set the soft limit, you're raising it to the larger value in the container for all processes that run within it.

Many daemon software and some OS package managers are known to use excessive CPU iterating through the large range of file descriptors (FDs) ensuring they're all closed before continuing as a good practice, but that range isn't meant to be ridiculously high.

Some software uses the select() call IIRC which isn't compatible with a soft limit higher than 1024, this may be something that runs in the OS core software, rather than any software you add and run in a container yourself on top of that, which may be more difficult to troubleshoot and pinpoint.

Other software like databases are known to allocate a large amount of memory with excessive limits set. These often have official documentation with advice on what limits to set when it's relevant, otherwise you should be running them with sane limits like everything else.

Last year in August, Go released 1.19 which introduced a feature to implicitly raise the soft limit to the hard limit, but that introduced a bug where it didn't lower it back for child processes. It's been addressed since earlier this year, in point releases of Go 1.19+, but if software is running that was compiled with the toolchain before that fix it could encounter that bug, which is mostly only relevant when the program runs child processes like containerd was affected.

elmr91 commented 1 year ago

@agners I have no particular clue about relation between LimitNOFILE and our containerd CPU increase concern. This limit seems to have an impact only during startup (or shutdown). In our case, CPU increase is permanent. But it may worth been investigated. How could we easily modify docker.service file within HAOS ?

Impact123 commented 1 year ago

Erik_167283946 recently mentioned an increased CPU increase from 10.4 to 10.5 on discord. I linked them here but they didn't post so I'm gonna go and corroborate that. See graphs below. haos10 5 Screenshot 2023-08-23 214414 I added a straight yellow line so the difference is easier to see. The red is just a mask because the data is irrelevant to the test.

geobogb commented 1 year ago

I tried 10.5, but it is still more or less unusable under Virtualbox. CPU usage went from around 5% with 9.5 to over 30% with 10.5.

Anyone with some ideas how to solve this? Screenshot_2023-09-05_15-32-12

ntkoopman commented 1 year ago

Setting LimitNOFILE=1024:524288 on containerd and restarting it (and docker) doesn't seem to have any effect. When running strace on containerd it seems the Go scheduler has a futex that keeps timing out in a tight loop (https://github.com/golang/go/blob/master/src/runtime/proc.go#L1631). I don't know anything about Go though, so maybe that is normal? All other goroutines look idle to me (gopark).

edit: in the process of testing this and killing containerd a couple of times, CPU usage went back to normal (0%)

atv2016 commented 1 year ago

That seems like something that would cause high CPU and definitely would be worth investigating. I guess it all depends how 'tight' the loop is, but i am not a Go expert either.

CPU here is still around 30%, when it used to be around 9%.


From: Tim Koopman @.> Sent: 05 September 2023 15:25 To: home-assistant/operating-system @.> Cc: atv2016 @.>; Comment @.> Subject: Re: [home-assistant/operating-system] CPU usage bumped from 2% to 10% after Operating System 10.0 upgrade (caused by containerd) (Issue #2476)

Setting LimitNOFILE=1024:524288 on containerd and restarting it (and docker) doesn't seem to have any effect. When running strace on containerd it seems the Go scheduler has a futex that keeps timing out in a tight loop (https://github.com/golang/go/blob/master/src/runtime/proc.go#L1631). I don't know anything about Go though, so maybe that is normal? All other goroutines look idle to me (gopark).

— Reply to this email directly, view it on GitHubhttps://github.com/home-assistant/operating-system/issues/2476#issuecomment-1706726424, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEBCFU5MSQ7C7MN66CBB4QTXY4Y4XANCNFSM6AAAAAAXC4QCVI. You are receiving this because you commented.Message ID: @.***>

geobogb commented 1 year ago

Hi!

I tried to compare HA OS 9.5 with HA OS 10.5 by doing a strace on the Virtulabox process on the host during one "real minute".

# Results with version 9.5
strace -c -f -p 947739
strace: Process 947739 attached with 31 threads
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------

 37.06   24.196627        1155     20941           ioctl
 27.43   17.907252         611     29304     14633 futex
 16.72   10.913958       18529       589           poll
 10.50    6.858527       28458       241           select
  6.78    4.423818       73730        60           rt_sigtimedwait
  1.50    0.976583      122072         8         5 restart_syscall
  0.02    0.010045          86       116           pwrite64
  0.00    0.000313          34         9         3 openat
  0.00    0.000264          29         9           read
  0.00    0.000246          82         3           pread64
  0.00    0.000157          19         8         4 recvfrom
  0.00    0.000152          50         3           statfs
  0.00    0.000117          19         6           write
  0.00    0.000084          14         6           close
  0.00    0.000055           9         6           getdents64
  0.00    0.000049           8         6           fstat
  0.00    0.000048          24         2           sendto
  0.00    0.000022           7         3           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   65.288317        1272     51320     14645 total
# Results with version 10.5
strace -c -f -p 947739
strace: Process 947739 attached with 31 threads
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 37.41   31.435030         149    209941           ioctl
 27.71   23.284473         770     30207     15095 futex
 16.52   13.881048       22681       612           poll
 10.29    8.646525       35877       241           select
  6.72    5.647733       94128        60           rt_sigtimedwait
  1.33    1.114393      139299         8         5 restart_syscall
  0.03    0.023518          32       729           pread64
  0.01    0.004321          44        98           pwrite64
  0.00    0.000085           9         9         3 openat
  0.00    0.000050          16         3           statfs
  0.00    0.000049           5         9           read
  0.00    0.000033           4         8         4 recvfrom
  0.00    0.000025           4         6           write
  0.00    0.000016           2         6           close
  0.00    0.000016           2         6           getdents64
  0.00    0.000013           2         6           fstat
  0.00    0.000012           6         2           sendto
  0.00    0.000006           2         3           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   84.037346         347    241954     15107 total

I can see that there are around ten times more syscalls to ioctl and the other syscalls are more or less the same number.

The "extra" syscalls in 10.5 seem to be:

[pid 947754]      0.000024 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000107 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000898 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000077 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000188 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1 <unfinished ...>
[pid 947820]      0.000406 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 947820]      0.000039 futex(0x7f97782853b0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 947820]      0.000052 futex(0x7f97782853a8, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1531690, tv_nsec=258643733}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 947754]      0.000208 <... ioctl resumed>) = 0
[pid 947754]      0.000026 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000134 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000271 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000630 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000100 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1 <unfinished ...>
[pid 947820]      0.000694 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 947820]      0.000023 futex(0x7f97782853b0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 947820]      0.000051 futex(0x7f97782853a8, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1531690, tv_nsec=260783821}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 947754]      0.000069 <... ioctl resumed>) = 0
[pid 947754]      0.000034 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000150 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000559 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000260 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0
[pid 947754]      0.000073 ioctl(7, ADV7842_CMD_RAM_TEST, 0x1) = 0

I hope this can help someone in hunting this bug down. Please let me know if I can provide any more information.

lyricnz commented 1 year ago

Good insight! FYI ioctl() is part of libc that is used to control special-files (virtual files). While there are 10x as many calls, the average duration of these is ~8x smaller, and total time is about 30% more. If you have the strace, which file does fh correspond to? (look for the open that returns 7, before any of the ioctl). Maybe /sys/fs/cgroup or something like that?

lyricnz commented 1 year ago

FWIW ADV7842 looks like some HDMI receiver & decoder, so maybe that's related the the (virtual?) console in some way?

geobogb commented 1 year ago

If you have the strace, which file does fh correspond to? (look for the open that returns 7, before any of the ioctl). Maybe /sys/fs/cgroup or something like that?

From /proc/PID/fd: lrwx------ 1 root root 64 Sep 8 04:11 7 -> /dev/vboxdrv

And just to avoid any misunderstandings. The strace is performed on the Virtualbox process in the host OS: /usr/lib/virtualbox/VBoxHeadless --comment Homeassistant --startvm d8dc6d2a-3c4b-4d7a-b8d1-1cbd069e03cf --vrde config

I'm running home assistant under Virtualbox. Host OS is Debian Bullseye, fully up to date. CPU Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz with 64 GB memory.

gfn256 commented 1 year ago

Yet another rise in average temp since latest core & supervisor updates of about 1.25°C Screenshot 2023-09-14 140710

Impact123 commented 1 year ago

It seems like this is getting worse as time goes on but temperature is a rather poor metric.

gfn256 commented 1 year ago

@Impact123 I fully agree that temp is not the best metric (even though I do check the ambient temp against the above graph). I can only add that I do see an increased processor activity as well as temp rise. Its just the graph of CPU % is much harder to present.

elmr91 commented 1 year ago

Yet another rise in average temp since latest core & supervisor updates of about 1.25°C

I didn't notice any CPU/power consumption noticeable change since last core & supervisor update image

(I'm still running HAOS 9.5)

gfn256 commented 1 year ago

(I'm still running HAOS 9.5)

I'm on HAOS 10.5, maybe this has something todo with it?

Impact123 commented 1 year ago

The title (and our previous observations here) hint towards this being a 10.x+ issue. Wattage (at least on its own) isn't a good metric either. The issue is CPU usage which doesn't always scale linearly with power consumption. The graph also has no markings.

N3rdix commented 1 year ago

Yet another rise in average temp since latest core & supervisor updates of about 1.25°C

I didn't notice any CPU/power consumption noticeable change since last core & supervisor update image

(I'm still running HAOS 9.5)

I am on 10.3 and also didn’t see any increase in CPU usage since upgrading from 9.5 (on a Pi4)

robotpandarocket commented 11 months ago

Same issue with NUC / Proxmox and OS 10.5

agners commented 11 months ago

So I finally came around to look into this issue a bit more in depth.

I first started to just revert to the containerd version shipped with HAOS 9.5, which is 1.6.8. However, that did not change anything, the CPU load was still higher as on a vanilla 9.5. This means it's really not a containerd change, but an environment/configuration issue.

I then looked into @geobogb's findings. However, in my installation, I did not notice a higher amount of ioctrl. What I did notice was order of magnitude higher amount of futex, nanosleep and epoll_pwait calls (e.g. 79 vs 28779 futex calls!). Something clearly makes containerd busy, while it is really calm on HAOS 9.5. Unfortunately staring at strace outputs and comparing checking open files in /proc/<pid>/fd didn't really gave me a hint to what could be causing this.

Sidenote: I've measured this by running strace in a Debian container for exactly 60s like so:

docker run --rm -it --pid=host --privileged debian
# apt install procps strace
# ps xfa # (get containerd main pid)
# strace -c -f -p <pidfromabove> & sleep 60 && killall strace

What I've then noticed is that on HAOS 9.5 containerd gets started with --config /var/run/docker/containerd/containerd.toml. In general, it seems that the startup of containerd changed from Docker invoked startup to a separate systemd service.

Changing options ine the systemd containerd.service file didn't really help.

I then passed the containerd.toml configuration from HAOS 9.5 to containerd on a dev build, and indeed, this did the trick! CPU load went down significantly, syscalls statistics back to normal.

Further isolating the config turned out that root = changed from a path underneath the Docker data path (in /mnt/data/docker) to the default /var/lib/containerd. The latter is located on a tmpfs. It seems that the file system difference causes a higher CPU load.

I'll create a PR to address this. There will be a OS 11.0.rc1 on the beta channel soon.

atv2016 commented 11 months ago

Very nice, amazing thank you.


From: Stefan Agner @.> Sent: 05 October 2023 09:09 To: home-assistant/operating-system @.> Cc: atv2016 @.>; Comment @.> Subject: Re: [home-assistant/operating-system] CPU usage bumped from 2% to 10% after Operating System 10.0 upgrade (caused by containerd) (Issue #2476)

So I finally came around to look into this issue a bit more in depth.

I first started to just revert to the containerd version shipped with HAOS 9.5, which is 1.6.8. However, that did not change anything, the CPU load was still higher as on a vanilla 9.5. This means it's really not a containerd change, but an environment/configuration issue.

I then looked into @geobogbhttps://github.com/geobogb's findings. However, in my installation, I did not notice a higher amount of ioctrl. What I did notice was order of magnitude higher amount of futex, nanosleep and epoll_pwait calls (e.g. 79 vs 28779 futex calls!). Something clearly makes containerd busy, while it is really calm on HAOS 9.5. Unfortunately staring at strace outputs and comparing checking open files in /proc//fd didn't really gave me a hint to what could be causing this.

Sidenote: I've measured this by running strace in a Debian container for exactly 60s like so:

docker run --rm -it --pid=host --privileged debian

apt install procps strace

ps xfa # (get containerd main pid)

strace -c -f -p & sleep 60 && killall strace

What I've then noticed is that on HAOS 9.5 containerd gets started with --config /var/run/docker/containerd/containerd.toml. In general, it seems that the startup of containerd changed from Docker invoked startup to a separate systemd service.

Changing options ine the systemd containerd.service file didn't really help.

I then passed the containerd.toml configuration from HAOS 9.5 to containerd on a dev build, and indeed, this did the trick! CPU load went down significantly, syscalls statistics back to normal.

I'll create a PR to address this. There will be a OS 11.0.rc1 on the beta channel soon.

— Reply to this email directly, view it on GitHubhttps://github.com/home-assistant/operating-system/issues/2476#issuecomment-1748337263, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEBCFU4B2FUUDKAQNUB5X3DX5ZTMBAVCNFSM6AAAAAAXC4QCVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYGMZTOMRWGM. You are receiving this because you commented.Message ID: @.***>

agners commented 11 months ago

11.0.rc1 is now on the beta channel. I consider this issue closed.

aborrajo commented 11 months ago

Thank you!

wokkeltje13 commented 11 months ago

Change seems to have positive effect. Tnx image

SaschaWeis commented 11 months ago

Can confirm the change has an positive effect on cpu load - at least when using kvm environment

geobogb commented 11 months ago

Thanks for looking into this issue! I have now tested 11.0.rc1, and this version works much better for me than 10.x did. Unfortunately, the increase in CPU usage is still around 50 percent compared to version 9.5.

As you may recall, I'm using Home Assistant in a headless Virtual Box environment on a Debian Linux host. The 10.x versions were never usable for me due to high CPU usage. Because of that, I was still on version 9.5.

The CPU usage on the host OS with version 9.5 the day before upgrading to 11.0.rc1 was below 7 percent. After upgrading to version 11.0.rc1, it's over 11 percent.

image

I did the same strace as before, and the result is somewhere in between the previous two straces.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 41.55   19.539223         233     83543           ioctl
 19.19    9.026242       15807       571           poll
 16.28    7.657338         294     26018     13000 futex
  7.77    3.653542       60892        60           rt_sigtimedwait
  7.73    3.636841      606140         6         3 restart_syscall
  7.46    3.509922     1754961         2           select
  0.01    0.004670          18       248           pwrite64
  0.00    0.000219          24         9         3 openat
  0.00    0.000181          60         3           statfs
  0.00    0.000175          19         9           read
  0.00    0.000168          28         6           fstat
  0.00    0.000083          13         6           close
  0.00    0.000082          13         6           write
  0.00    0.000069           8         8         4 recvfrom
  0.00    0.000054          27         2           sendto
  0.00    0.000043           7         6           getdents64
  0.00    0.000034           4         7           pread64
  0.00    0.000018           6         3           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   47.028904         425    110513     13010 total

After downgrading to version 9.5, the CPU usage is back to below 7 percent.

lyricnz commented 11 months ago

This has improved things for me (proxmox on i5-7500T, 2 cores to HAOS). CPU went from 2.6-2.7% down to 1.5-1.6%

tyjtyj commented 11 months ago

@agners Thanks for saving the earth.. u probably save billions of KWH worldwide

Incase anyone looking. Running this on Proxmox 8.0.4 intel nuc i5 7th gen(i5-7260U). Power save mode enabled u need to go from 9.5 to 10.x to 11.0.rc1

Before you do anything. THIS IS RC1 means anything can break. Not for production use.

TAKE SNAPSHOT/BACKUP

to upgrade from 9.5 to 10.x os upgrade wait for fully started. verified the os is 10.x os upgrade --version 11.0.rc1 wait for it to reboot.

5 mins after upgrade. i can see the cpu load same as 9.5 which is around 3%. 10.x was more than 5%

image

aborrajo commented 11 months ago

I have upgraded from 9.5 through 10.5 to 11.0.rc1 and the cpu consumption has decreased significantly compared to 10.5. However, you can see that there is a slight increase in cpu consumption compared to 9.5.

Captura

Proxmox 8.0.4 (AMD Ryzen 5 5560U)

gfn256 commented 11 months ago

@agners

11.0.rc1 is now on the beta channel. I consider this issue closed.

Thanks for your excellent work as usual!

Any idea when HASSos 11.0 (non beta) will be released, and will it include this fix?

gfn256 commented 11 months ago

Updated to HASSos 11.0. Average CPU temp down about 2-3°C. Overall system also much more responsive. @agners Excellent work as usual! Thanks so much.

andrazek commented 11 months ago

My mini PC still runs about 5°C warmer when on HA OS 11 compared to 9.5 😞

Screenshot_2023-10-16-16-25-16-717_io homeassistant companion android

evilspoons commented 11 months ago

I'm also seeing an improvement. I haven't compared this version of Home Assistant to HAOS 9.5, but the 10.x -> 11 transition is definitely a huge drop. image

geobogb commented 10 months ago

I hate to ruin the party, but every version of HAOS i tried since 9.5 has been more or less unusuable for me.

Below is my CPU usage after upgrading from 9.5 to 11.1. As you can see, the CPU-usage has increased almost three times.

Screenshot_2023-11-04_16-22-40

I'm running Home Assistant under VirtualBox. The CPU-usage is on the Linux host.

Below is the same strace as I have done before. The number of calls to ioctl is more than six times in 11.1 compared to 9.5.

strace -c -f -p 1134292 & sleep 60 && killall strace
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 41.41   15.477748         117    131623           ioctl
 21.33    7.973976       12100       659           poll
 19.25    7.193907         261     27505     13745 futex
  8.86    3.312630       55210        60           rt_sigtimedwait
  5.45    2.038370     1019185         2           select
  3.69    1.381010      230168         6         3 restart_syscall
  0.00    0.001699           7       231           pwrite64
  0.00    0.000081           9         9         3 openat
  0.00    0.000064           7         9           read
  0.00    0.000061           7         8         4 recvfrom
  0.00    0.000030          10         3           statfs
  0.00    0.000027           4         6           write
  0.00    0.000021          10         2           sendto
  0.00    0.000017           2         6           close
  0.00    0.000014           2         6           fstat
  0.00    0.000014           2         6           getdents64
  0.00    0.000007           2         3           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   37.379676         233    160144     13755 total

Please let me know what I can do to help to solve this problem!

vladimirdarak commented 4 months ago

It helped me when I changed the settings of the bluetooth service to passive mode: image