home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.38k stars 30.64k forks source link

High CPU and memory usage under 0.108.0 #33866

Closed McGiverGim closed 4 years ago

McGiverGim commented 4 years ago

The problem

Since the installation of HA version 0.108.0 the CPU and memory usage in my system grows. I have restarted several times (HA and the raspberry) but this does not fix nothing.

This graph shows the difference from 0.107.7 to 0.108.0:

image

You can see the low CPU in blue and the memory stable at about 30% in red until the installation of 0.108.0. Since then the memory and the CPU grows in the time. I have restarted it several times without luck, as you see in the graph.

My first idea is to go back to 0.107.7 but I prefer to comment this here because maybe there is a bug and I can help giving information.

Environment

Problem-relevant configuration.yaml

I don't know what to attach. My complete configuration?

Traceback/Error logs

I can't see any error, only the Brother that can't find the powered off printer, but this was the same in 0.107.7.

2020-04-09 08:39:56 ERROR (MainThread) [homeassistant.components.brother] Error fetching brother data: No SNMP response received before timeout
2020-04-09 08:39:56 WARNING (MainThread) [homeassistant.config_entries] Config entry for brother not ready yet. Retrying in 80 seconds.

Additional information

I have some custom integration, but they were there in earlier versions too:

andriej commented 4 years ago

It's high probably that nodered is leaking.

McGiverGim commented 4 years ago

I'm doing tests disabling the custom integrations. A docker stats command shows both, nodered and homeassistant growing in memory. If I stop nodered, homeassistant stops to grow too. The CPU continues high with or without nodered. So it seems nodered is the culprit (the most probable) or some sensor produced by nodered causes this in the system. The strange thing is that this happens since the update to 0.108.0 it was not there in 0.107.7 or any previous version.

gohm44 commented 4 years ago

I do not have any custom integration nodered and so on. After upgrade to the new version container consumes more resources. Eats more CPU usage and memory every minute... After ~12h I had to restart home assistant. I'm right after the upgrade to 0.108.1 but as far as I see it behave the same manner. My OS: Ubuntu 18.04

The new release introduced Jamalloc. Maybe it is our suspect?

Docker images for Home Assistant are now using Jemalloc, to reduce memory fragmentation and speed up memory allocation. So, less memory and generally a faster Home Assistant.

image

McGiverGim commented 4 years ago

Happy to hear that maybe is not node red related and I'm not alone in this problem. I think I see it with node red because I have sensors and flows that react from one to the other and maybe for some reason they don't do it like in the earlier version (maybe some sensor is out of control?).

I tested too 0.108.1 without luck, as you. I have made rollback to 0.107.7 and now all works as expected again.

If someone has some idea, I can upgrade again to 0.108.1 to test it, but until them, is impossible for me to remain in this version.

Coren4 commented 4 years ago

I have already updated to 0.108.1 and I have te same issue. CPU and memory usage is getting higher and higher, so around 24h after start both are above 90%.

Before 0.108 my CPU was around 5% on idle, and memory around 45%.

gohm44 commented 4 years ago

A reboot of the server helped me to prevent constant increasing memory and CPU usage but anyway it's much higher than was before.

2020-04-10_06-54

AndrzejOlender commented 4 years ago

I have the same problem, except for a high CPU and a large delay in HA at that time (the automatics worked with a few seconds delay) disk jump at the same time.

Screenshot 2020-04-10 at 08 13 38 Screenshot 2020-04-10 at 08 13 50

Intel NUC, HA Corde in docker 0.108.1

Coren4 commented 4 years ago

0.108.2 it is still present

I use old 2 core/4gb RAM/SSD laptop as host with Ubuntu 18 Server.

What I use with HA: ESPHome Zigbee2Mqtt Mqtt broker Node red Google drive backup Plex integration Airly integration Brother printer integration UPnP integration to network router

I am using MariaDB in container next to HA, as a storage.

gohm44 commented 4 years ago

Actually I notice that for me 0.108.2 almost resolve the issue. Memory usage gets back to previous. CPU usage is still higher but at least constant.

Coren4 commented 4 years ago

Actually I notice that for me 0.108.2 almost resolve the issue. Memory usage gets back to previous. CPU usage is still higher but at least constant.

I also thought it solved problem for me, but then 1h hour passed, and everything came back.

McGiverGim commented 4 years ago

The CPU usage as high is a clear symptom that the problem is still there. While I was playing starting and stopping nodered to see if it goes better, I thought I fixed it because at some moment the memory was stable, but the CPU was high and some time later the memory started to grow again.

mountainsandcode commented 4 years ago

Struggling with the same issue

McGiverGim commented 4 years ago

@mountainsandcode can you give more information about your system? Are you using unofficial integrations like Node Red?

mountainsandcode commented 4 years ago

I'm running HASS on a Synology using Docker - I have HACS running and two self-programmed integrations, but they have been running for quite a while. I suspect this may be due to https://github.com/home-assistant/core/issues/33882#issuecomment-612424222 as I can also see two python processes

Coren4 commented 4 years ago

I don't have Homekit integration, so in my case I don't think it is it.

haseat commented 4 years ago

Having the same problem in Home Assistant Core on a Pi4 since upgrading to 0.108.n, although on a much lower level. As you can see, memory usage goes up until a restart. image

Alessandro1981 commented 4 years ago

I have the same issue updating from 107.7 to 108.3. Rolling back to version 107.7 the issue disappeared (memory consumption is flat).

Alessandro

chosten commented 4 years ago

Same issue. I'm using docker on a i5 and I'm not using Nodered. Problem appeared with v108.0. I have to restart the container often because all services starts to timeout.

AndrzejOlender commented 4 years ago

I also have a HA Core in Docker, maybe the last change is the cause of the problem?

Docker images for Home Assistant are now using Jemalloc, to reduce memory fragmentation and speed up memory allocation. So, less memory and generally a faster Home Assistant.

gohm44 commented 4 years ago

Since version 0.108.3 everything goes back to normal for me.

McGiverGim commented 4 years ago

It does not fix the problem for me. Tested it today with 0.108.3 and you can see again the memory going up and the CPU going up and down : image

EDIT: I edit to give more information. The memory is wasted in the homeassistant docker. After the installation: image

Some time later: image

The rest of addons seems to be stable.

A ps inside the homeassistant docker does not reveal nothing strange.

choeflake commented 4 years ago

Same here (except the memory). Issue experienced first with 0.108.? then upgraded to 0.108.3, still the same. Now back to 0.107.7 and issue is gone (all other components are not downgraded). Under 0.108.x, my log was full of events like:

ha             | 2020-04-13 22:22:17 INFO (MainThread) [homeassistant.components.mqtt] Got update for entity with hash: ('binary_sensor', '0x0017880104b53dcc occupancy') '{'payload_on': True, 'payload_off': False, 'value_template': '{{ value_json.occupancy }}', 'device_class': 'motion', 'state_topic': 'zigbee2mqtt/hal_1_sensor_1', 'json_attributes_topic': 'zigbee2mqtt/hal_1_sensor_1', 'name': 'hal_1_sensor_1_occupancy', 'unique_id': '0x0017880104b53dcc_occupancy_zigbee2mqtt', 'device': {'identifiers': ['zigbee2mqtt_0x0017880104b53dcc'], 'name': 'hal_1_sensor_1', 'sw_version': 'Zigbee2mqtt 1.12.2', 'model': 'Hue motion sensor (9290012607)', 'manufacturer': 'Philips'}, 'availability_topic': 'zigbee2mqtt/bridge/state', 'platform': 'mqtt'}'
ha             | 2020-04-13 22:22:17 INFO (MainThread) [homeassistant.components.mqtt] Updating component: binary_sensor.hal_1_sensor_1_occupancy

(not sure which of the two rows is first)

For every entity (thus devices multiplied by the number of entities on it), this log is written every second.

My config: Ubuntu with Docker Compose running mosquitto 1.6.9, Zigbee2mqtt 1.12.2 (firmware 20200328). Stopping the mosquitto container reduces the CPU, listening on the MQTT shows that hundreds of messages per second are processed.

haseat commented 4 years ago

0.108.4 seems to have fixed the problem for me so far

andriej commented 4 years ago

@haseat according to changelog nothing seems to be changed regarding system. Maybe you've updated HACS meanhwile too?

haseat commented 4 years ago

@andriej you're right, I totally forgot about that, but I did it right before the 0.108.4 update

McGiverGim commented 4 years ago

My latest test with 0.108.3 was with HACS updated to the latest version, and that did not fix the problem for me :(

McGiverGim commented 4 years ago

I have observed that some users use py-spy to analyze what is going on with the system, one example on this thread:

https://github.com/home-assistant/core/issues/34093

Someone from here is able to execute it? I don't know nothing about python so I don't know if it can be executed in a running released hass.io/homeassistant instance.

McGiverGim commented 4 years ago

Other user with problems in reddit too: https://www.reddit.com/r/homeassistant/comments/fykg8l/ha_108_on_roller_coaster_cpumem_ride/

Lawrencezarb commented 4 years ago

I have the same problem, even using 108.4. I have reverted to 107.7

Gunth commented 4 years ago

I also have the same issue with the 108.5 version .... :-(

McGiverGim commented 4 years ago

Nobody here is able to install and execute py-spy? https://github.com/benfred/py-spy

I have tried, but it seems the instructions are not valid for Hass.io or my system.

D43m0n666 commented 4 years ago

Same problem for me with 108.4. Rollback to 108.2 solved the problem. I've installed HA in Raspberry 3B+ with 1GB RAM and assigned 1GB for SWAP. No HACS and no custom integration

ayufan commented 4 years ago

It seems that the fix for me was to disable Brother integration. My brother is mostly not running, but it seems that HA and Brother do enqueue a number of requests for fetching data even though device does not respond.

Maybe this is problem with the Brother integration, but this could as well be problem in a state update loop that it re-enqueues fetching of data even though the previous op did not finish.

2020-04-16 09:21:12 ERROR (MainThread) [homeassistant.components.brother] Unexpected error fetching brother data:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/homeassistant/helpers/update_coordinator.py", line 129, in async_refresh
    self.data = await self._async_update_data()
  File "/usr/local/lib/python3.7/dist-packages/homeassistant/components/brother/__init__.py", line 80, in _async_update_data
    await self.brother.async_update()
  File "/srv/homeassistant/deps/lib/python3.7/site-packages/brother/__init__.py", line 43, in async_update
    raw_data = await self._get_data()
  File "/srv/homeassistant/deps/lib/python3.7/site-packages/brother/__init__.py", line 145, in _get_data
    *request_args, *self._oids
  File "/usr/lib/python3.7/asyncio/coroutines.py", line 123, in coro
    res = yield from res
concurrent.futures._base.CancelledError
2020-04-16 09:21:12 ERROR (MainThread) [homeassistant.components.brother] Unexpected error fetching brother data:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/homeassistant/helpers/update_coordinator.py", line 129, in async_refresh
    self.data = await self._async_update_data()
  File "/usr/local/lib/python3.7/dist-packages/homeassistant/components/brother/__init__.py", line 80, in _async_update_data
    await self.brother.async_update()
  File "/srv/homeassistant/deps/lib/python3.7/site-packages/brother/__init__.py", line 43, in async_update
    raw_data = await self._get_data()
  File "/srv/homeassistant/deps/lib/python3.7/site-packages/brother/__init__.py", line 145, in _get_data
    *request_args, *self._oids
  File "/usr/lib/python3.7/asyncio/coroutines.py", line 123, in coro
    res = yield from res
concurrent.futures._base.CancelledError
2020-04-16 09:21:12 ERROR (MainThread) [homeassistant.components.brother] Unexpected error fetching brother data:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/homeassistant/helpers/update_coordinator.py", line 129, in async_refresh
    self.data = await self._async_update_data()
  File "/usr/local/lib/python3.7/dist-packages/homeassistant/components/brother/__init__.py", line 80, in _async_update_data
    await self.brother.async_update()
  File "/srv/homeassistant/deps/lib/python3.7/site-packages/brother/__init__.py", line 43, in async_update
    raw_data = await self._get_data()
  File "/srv/homeassistant/deps/lib/python3.7/site-packages/brother/__init__.py", line 145, in _get_data
    *request_args, *self._oids
  File "/usr/lib/python3.7/asyncio/coroutines.py", line 123, in coro
    res = yield from res
concurrent.futures._base.CancelledError

# and like x20 more

It seems that major aspects at least changed for Brother: https://github.com/home-assistant/core/commit/c1ceab09e5281dcf6b42592bb2e35cb6518f4885#diff-cb38b021b379128e76bd5b1f539660ca.

D43m0n666 commented 4 years ago

I've also Brother integration. I try to disable it and upgrade HA...I let you know!

chosten commented 4 years ago

I have no Brother integration and it is still not working with v108.5

ayufan commented 4 years ago

So far, is good (after removing Brother):

image

As, stated, it could be some generic component that is responsible for updating entities, that it repeats device update aggressively when it is unavailable. I just see this happening on Brother, as this is unavail for me for most of time.

McGiverGim commented 4 years ago

I have the Brother integration, but I think remember I installed it after the problem, but I will try later without it.

@D43m0n666 waiting for your test...

@SquareBeard do you have any message in the "developer tools", "registry" that repeats a lot? Maybe is not the Brother integration, but any integration that does not detect the device and tries to find it.

chosten commented 4 years ago

@McGiverGim That is indeed a good possibility, I have a device that is always off. I'll try to desactivate it.

D43m0n666 commented 4 years ago

I have the Brother integration, but I think remember I installed it after the problem, but I will try later without it.

@D43m0n666 waiting for your test...

Disabling Brother integration it seems ok with latest release 108.5, FOR NOW! I can give you better news in 24H, thanks @ayufan for suggestion!

chosten commented 4 years ago

Well that worked. My HA instance has not been running that long since v107.

There is definitely something wrong with unreachable devices or with the logging of the errors related to such devices.

Gunth commented 4 years ago

I also have a Brother printer, i need to check this ...

McGiverGim commented 4 years ago

Disabled the Brother integration and all seems stable too... maybe we have found the problem. I will check in the next hours but it looks good until now.

I don't know if some developer is following this thread or is there a way to make them aware of that...

ayufan commented 4 years ago

@bieniu Can you take a look as you are the author of latest changes? https://github.com/home-assistant/core/commit/c1ceab09e5281dcf6b42592bb2e35cb6518f4885#diff-cb38b021b379128e76bd5b1f539660ca

chosten commented 4 years ago

@ayufan It is not related to the Brother integration. It's seems to be related to unreachable devices and/or logging.

bieniu commented 4 years ago

@ayufan I'm looking for issue in code since your first post about brother but I don't see anything suspicious. On my 3 devices, the CPU load is the same with and without brother integration and with the priter turned on or turned off.

ayufan commented 4 years ago

@SquareBeard It seems so, but someone familiar with lifecycle of devices could help out with testing it and git bisecting :)

chosten commented 4 years ago

@bieniu Did you try with your printers powered off ?

bieniu commented 4 years ago

@SquareBeard Yes.

ayufan commented 4 years ago

@bieniu This starts to happen after long period. For example I noticed a bunch of messages failing to get SNMP data due to timeout/missing DNS/UDP connection. Also, SNMP seems to have failed as well when printer was powered on.

bieniu commented 4 years ago

@ayufan OK I will turn off my printer for the whole day.