home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.71k stars 30.45k forks source link

High CPU & mem usage #37348

Closed iamBiB closed 4 years ago

iamBiB commented 4 years ago

The problem

Home Assistant release with the issue: 0.112.0

Operating environment (HassOS/Generic): Ubuntu

Supervisor logs: 20-07-02 09:48:57 INFO (SyncWorker_11) [supervisor.docker.interface] Start homeassistant/intel-nuc-homeassistant 20-07-02 09:49:02 INFO (MainThread) [supervisor.homeassistant] Updated Home Assistant API token 20-07-02 09:49:22 INFO (MainThread) [supervisor.homeassistant] Detect a running Home Assistant instance 20-07-02 09:49:22 INFO (MainThread) [supervisor.addons] Phase 'AddonStartup.APPLICATION' start 1 add-ons 20-07-02 09:49:22 INFO (SyncWorker_5) [supervisor.docker.interface] Clean addon_cebe7a76_hassio_google_drive_backup application 20-07-02 09:49:23 INFO (SyncWorker_5) [supervisor.docker.addon] Start Docker add-on sabeechen/hassio-google-drive-backup-amd64 with version 0.100.0 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /homeassistant/info access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /supervisor/info access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /snapshots access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /snapshots/73acca6f/info access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /snapshots/a6261e32/info access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /snapshots/a885b964/info access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:23 INFO (MainThread) [supervisor.api.security] /snapshots/63506f9d/info access from cebe7a76_hassio_google_drive_backup 20-07-02 09:49:28 INFO (MainThread) [supervisor.misc.tasks] All core tasks are scheduled 20-07-02 09:49:28 INFO (MainThread) [supervisor.misc.hwmon] Started Supervisor hardware monitor 20-07-02 09:49:28 INFO (MainThread) [supervisor.core] Supervisor is up and running 20-07-02 09:49:28 INFO (MainThread) [supervisor.host.info] Update local host information 20-07-02 09:49:28 INFO (MainThread) [supervisor.utils.gdbus] Call org.freedesktop.DBus.Properties.GetAll on /org/freedesktop/hostname1 20-07-02 09:49:28 INFO (MainThread) [supervisor.updater] Fetch update data from https://version.home-assistant.io/stable.json 20-07-02 09:49:28 INFO (MainThread) [supervisor.host.services] Update service information 20-07-02 09:49:28 INFO (MainThread) [supervisor.utils.gdbus] Call org.freedesktop.systemd1.Manager.ListUnits on /org/freedesktop/systemd1 20-07-02 09:49:28 INFO (MainThread) [supervisor.host.network] Update local network DNS information 20-07-02 09:49:28 INFO (MainThread) [supervisor.utils.gdbus] Call org.freedesktop.DBus.Properties.GetAll on /org/freedesktop/NetworkManager/DnsManager 20-07-02 09:49:28 INFO (MainThread) [supervisor.host.sound] Update PulseAudio information

Description of problem: i`ve restarted my nuc today and notice a spike in internet trafic.. went further and noticed this: image

any idea what this might be?

Environment

arch x86_64
chassis desktop
dev false
docker true
docker_version 19.03.8
hassio true
host_os Ubuntu 20.04 LTS
installation_type Home Assistant Supervised
os_name Linux
os_version 5.4.0-39-generic
python_version 3.7.7
supervisor 228
timezone Europe/Bucharest
version 0.112.0
virtualenv false

Problem-relevant configuration.yaml

Traceback/Error logs


2020-07-02 13:10:20 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback EventBus.async_listen_once.<locals>.onetime_listener(<Event homeassistant_stop[L]>) at /usr/src/homeassistant/homeassistant/core.py:680
TypeError: stop() takes 1 positional argument but 2 were given

2020-07-02 13:16:12 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback EventBus.async_listen_once.<locals>.onetime_listener(<Event homeassistant_stop[L]>) at /usr/src/homeassistant/h>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 692, in onetime_listener
    self._hass.async_run_job(listener, event)
  File "/usr/src/homeassistant/homeassistant/core.py", line 401, in async_run_job
    self.async_add_job(target, *args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 331, in async_add_job
    task = self.loop.create_task(target(*args))
TypeError: stop() takes 1 positional argument but 2 were given

[s6-finish] waiting for services.
2020-07-02 13:57:15 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback EventBus.async_listen_once.<locals>.onetime_listener(<Event homeassistant_stop[L]>) at /usr/src/homeassistant/homeassistant/core.py:680
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 692, in onetime_listener
    self._hass.async_run_job(listener, event)
  File "/usr/src/homeassistant/homeassistant/core.py", line 401, in async_run_job
    self.async_add_job(target, *args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 331, in async_add_job
    task = self.loop.create_task(target(*args))
TypeError: stop() takes 1 positional argument but 2 were given
Error doing job: Exception in callback async_track_state_change.<locals>.state_change_listener(<Event state_...879542+03:00>>) at /usr/src/homeassistant/homeassistant/helpers/event.py:81
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 103, in state_change_listener
    event.data.get("new_state"),
  File "/usr/src/homeassistant/homeassistant/core.py", line 401, in async_run_job
    self.async_add_job(target, *args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 336, in async_add_job
    None, target, *args
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 755, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 163, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error doing job: Exception in callback async_track_state_change.<locals>.state_change_listener(<Event state_...880216+03:00>>) at /usr/src/homeassistant/homeassistant/helpers/event.py:81
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 103, in state_change_listener
    event.data.get("new_state"),
  File "/usr/src/homeassistant/homeassistant/core.py", line 401, in async_run_job
    self.async_add_job(target, *args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 336, in async_add_job
    None, target, *args
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 755, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 163, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error doing job: Exception in callback async_track_state_change.<locals>.state_change_listener(<Event state_...880793+03:00>>) at /usr/src/homeassistant/homeassistant/helpers/event.py:81
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 103, in state_change_listener
    event.data.get("new_state"),
  File "/usr/src/homeassistant/homeassistant/core.py", line 401, in async_run_job
    self.async_add_job(target, *args)
  File "/usr/src/homeassistant/homeassistant/core.py", line 336, in async_add_job
    None, target, *args
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 755, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 163, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
s6-svscanctl: fatal: unable to control /var/run/s6/services: supervisor not listening
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] udev.sh: executing... 
[13:57:23] INFO: Update udev information
[cont-init.d] udev.sh: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.

Additional information

I see a lot of internet traffic as well. maybe it want to update/install something and it fails

chemelli74 commented 4 years ago

I don't have any trace log, but I noticed a huge increase of CPU upgrading from 0.111.4 to 0.112.0

Let me know how to help to track it down.

Simone

m1ke-hub commented 4 years ago

Do you guys maybe have the speedtest integration enabled? I also had extreme CPU usage, which was because of multiple tests per minute (with auto update disabled) - if I re-apply the options, the CPU usage is back to normal.

Here's my isssue: https://github.com/home-assistant/core/issues/37381

bdraco commented 4 years ago

You should be able to find out what is going on with https://github.com/benfred/py-spy

iamBiB commented 4 years ago

Haven`t got a chance to update this yesterday. The issue was indeed speedtest. it just kept testing. i used py-spy and found some issues with it. some incorrect urls in the tester .. like https://speedtest.ciubi.net/ .. seems to go to a blog or something.. i just disabled it, restarted homeassistant and now it works fine.

chemelli74 commented 4 years ago

My issue since linked to "zeroconf" and a DB query:

image

Simone

chemelli74 commented 4 years ago

Going worst:

image

Simone

bdraco commented 4 years ago

The svg that record generates is much easier to read. Could you post one of those?

chemelli74 commented 4 years ago

The svg that record generates is much easier to read. Could you post one of those?

Here it is: profile.zip

Simone

bdraco commented 4 years ago

What was the duration you used on the recording?

bdraco commented 4 years ago
Screen Shot 2020-07-03 at 9 08 37 AM

Unless its a really short recording duration, the problem is its swamped processing an automation template

chemelli74 commented 4 years ago

What was the duration you used on the recording?

1 minute: py-spy record -o profile.svg -r 40 -d 60 --pid 1

Too short maybe ?

Simone

bdraco commented 4 years ago

360 seconds would give a better picture

chemelli74 commented 4 years ago

Here it is: profile2.zip

This is the view from docker: image

Take care that both instances have the same version and devices. Very similar DB size as well.

Simone

bdraco commented 4 years ago

It is definitely one of the templates in an automation.

Can you post your automations.yaml ?

chemelli74 commented 4 years ago

I'm on split config, so I attached the complete folder: automations.zip

Take care that the same config is on the test machine and I don't see so much cpu usage. Also I disabled all the automations and did a restart of the server, but CPU is still above 40%.

Simone

bdraco commented 4 years ago

Your shelly.yaml automations are firing every time any entity on the system changes state and then evaluating the template. That's very expensive. If you can limit the trigger to specific entities it will be much faster.

  trigger:
    platform: event
    event_type: state_changed

Try disabling that one for now and see if the cpu gets better.

tapparelle.yaml could probably optimized as well

chemelli74 commented 4 years ago

That's the automation situation:

image

But still CPU is more that 40%.

New profile: profile3.zip

Simone

bdraco commented 4 years ago

Looks like fritzbox is the cause of the rest

chemelli74 commented 4 years ago

Looks like fritzbox is the cause of the rest

How it's possibile that 2 instances that have the same config and nearly the same DB have so different behaviors ? I checked with a file compare to be sure /config contents are identical when it comes to yaml files.

Simone

bdraco commented 4 years ago

Try a ./py-spy dump --pid <pid>

Is there any significant difference in what each instance is doing?

chemelli74 commented 4 years ago

I was out for a few hours and now that I'm back I see CPU at 11%, so low as before the upgrade. Seems it took a while to go from >40% to 11% even if I did a restart after disabling the automations.

I now enabled them back all but Shelly and Tapparelle. Let's see tomorrow morning CPU usage history and move step by step.

At this point I bet on my Shelly automation but cannot understand why it works on one system and not in the other...

Simone

chemelli74 commented 4 years ago

Everything still working fine. Now time to dig into the automations ;-)

For Shelly, I was thinking about creating 1 automation that will create the list of shelly and another one that will investigate the values so to avoid the loop trough states. What you think ? If you are on discord we may discuss that there.

Thanks a lot for your great help so far.

Simone

bdraco commented 4 years ago

If the automation knows the entities to watch ahead of time it is much faster (and it will be even faster in 0.113) as it only has to look at the specific stage changes for those entities.

chemelli74 commented 4 years ago

If the automation knows the entities to watch ahead of time it is much faster (and it will be even faster in 0.113) as it only has to look at the specific stage changes for those entities.

Got it, this is why I was thinking about a platform that creates a list based on desired values and then use that as reference. Something like:

- platform: dynamic_list
  entities:
    - shelly*

so that is not created at every state change, but only once at startup. Would be a nice addition.

Simone

bdraco commented 4 years ago

Unfortunately I don't think we have glob matching for entity ids. At least I couldn't find code to do that.

herik63 commented 4 years ago

Hallo, here similar issue: CPU to 50-60%

Screenshot 2020-07-05 20 53 44
chemelli74 commented 4 years ago

Unfortunately I don't think we have glob matching for entity ids. At least I couldn't find code to do that.

Could be a nice addition ;-)

Simone

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

chemelli74 commented 4 years ago

Once I disabled the offending automation, all worked fine. Unfortunately didn't find jet a way to get the same result without killing CPU. Strange enough it was working fine with older HA.

As I don't expect a solution out of this issue, you can close it.

Simone

bdraco commented 4 years ago

@chemelli74 Is it any better in 0.116b2?

chemelli74 commented 4 years ago

@chemelli74 Is it any better in 0.116b2?

Honestly, I disabled the guilty automation since several weeks. If it makes sanse, I can retest. Let me know,

Simone

bdraco commented 4 years ago

If you aren't using it in production, I don't think its worth the trouble to re-test. 👍