home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.39k stars 30.29k forks source link

ONVIF Integration May Leak Memory #42390

Closed AndrewGWalsh closed 3 years ago

AndrewGWalsh commented 3 years ago

The problem

HA failing 30 - 60 minutes after restarting HA. Python3 memory consumption increases until it runs out. Problem seen immediately after upgrade to first beta of 117 and with each succeeding beta

Environment

'Community Guide' installation: Raspberry Pi 4, 4GB, Raspberry Pi OS, Docker

Problem-relevant configuration.yaml

ONVIF set up through UI.

Traceback/Error logs

Additional information

Restoring release version of 116 fixes it. Removing the 8 ONVIF camera integrations I have also fixes it.

probot-home-assistant[bot] commented 3 years ago

onvif documentation onvif source (message by IssueLinks)

probot-home-assistant[bot] commented 3 years ago

Hey there @hunterjm, mind taking a look at this issue as its been labeled with an integration (onvif) you are listed as a codeowner for? Thanks! (message by CodeOwnersMention)

balloob commented 3 years ago

What cameras do you use?

AndrewGWalsh commented 3 years ago

2 H.View 5MP Fixed https://smile.amazon.co.uk/gp/product/B0798KB3G9/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 3 H.View 5MP Dome https://smile.amazon.co.uk/gp/product/B07HFM8D42/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 1 PTZLink 5MP Mobile https://smile.amazon.co.uk/gp/product/B07XJRQ9L8/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 2 Doorbird stations via the Doorbird integration

hunterjm commented 3 years ago

Investigating. The beta I'm on uses 6-7 GB of memory, but I believe the majority of that is from the TensorFlow model. I do not see constant increases in memory usage, so I do not believe it is a leak. The httpx lib does show increased usage when using SSLContext, so I will test setting verify=False in the parent lib and see if that has a noticeable effect on memory usage.

AndrewGWalsh commented 3 years ago

Without ONVIF, after a few hours up time, my whole system memory consumption reports 827mb of 3,826Mb used with 10-20% cpu used.

AndrewGWalsh commented 3 years ago

Prior to experiencing this problem (i.e. for months) I used to get errors like this in my log - I don't know if it is relevant, could find nothing on the forum.

2020-10-26 19:47:30 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for 'f0:23:b9:52:07:47':

I've just added one camera back to my config to see if it starts the problem back. Will update you idc.

AndrewGWalsh commented 3 years ago

Adding back one of the dome cameras resulted in the memory problem. I’ve deleted it again but happy to try things.

hunterjm commented 3 years ago

That's very interesting. I've still been unable to reproduce with a development instance running for hours and memory consumption stable at ~216 MB with 4 ONVIF cameras configured. Before it crashes, are you seeing any additional functionality in the ONVIF integration that is not there in 0.116? Namely, we made a change where we are now maybe pulling in events from devices that previously were disabled.

If you do see new sensors for your camera, try disabling them all instead of removing the integration and see if that solves your memory problem.

AndrewGWalsh commented 3 years ago

Where should I look to see this functionality?

hunterjm commented 3 years ago

The device on the devices page should have a list of entities when you click on it.

AndrewGWalsh commented 3 years ago

I have just 2 entities, one for the main profile and one for the sub profile (the camera supports two profiles) - I think that is unchanged?

hunterjm commented 3 years ago

How are you adding your cameras? Are they discovered or added manually in the UI? You said you were able to reproduce adding your dome camera back, what about the others? Can you add each type of camera back individually to see if this is limited to capabilities of the camera? I noticed the dome was PTZ for example.

AndrewGWalsh commented 3 years ago

I add them using the UI, they do not get discovered. I am trying with the fixed camera on its own now.

AndrewGWalsh commented 3 years ago

The fixed camera has the same result, although it might or might not be happening slower.

hunterjm commented 3 years ago

Since I am unable to reproduce locally, I've added a memory profiler in this PR: #42435

Because that won't be released until 0.118 it doesn't help us here unless you are willing to run it as a custom component. The related files to do so are here: https://github.com/hunterjm/home-assistant/tree/profile-memory/homeassistant/components/profiler

After creating the custom component, you should be able to add it through Config -> Integrations -> + -> Profiler. Once added, on the Developer Tools -> Services tab there should be a profiler.memory service available. Call that service and in 60 seconds there will be a new .hpy file in your configuration directory. Please upload that file to this ticket.

AndrewGWalsh commented 3 years ago

I am a little inexperienced with git - but should I have been able to git clone that into my custom_components directory? I tried and it said repo not found.

hunterjm commented 3 years ago

Paste the URL into a tool like below and it will give you a zip file you can extract.

https://minhaskamal.github.io/DownGit/#/home

McGiverGim commented 3 years ago

I don't know if related, but since 0.117.0 is clear that there is some memory lead somewhere: image

I use ONVIF too, but I'm not sure if it is the culprit in this case. Only to let it know.

McGiverGim commented 3 years ago

After two restarts without luck, I have reverted to 0.116.4 and the memory leak is gone: image

hallenmaia commented 3 years ago

The same is happening here heap_profile.1603985862756423.hpy.zip

mem

. The memory goes to 96% and locks the core.

Log when memory exceeds 66%:

2020-10-29 12:22:46 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8b': Unknown Error
2020-10-29 12:22:46 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8d': Unknown Error
2020-10-29 12:23:40 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback WebSocketResponse._send_heartbeat()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_ws.py", line 102, in _send_heartbeat
    self._loop.create_task(self._writer.ping())  # type: ignore
AttributeError: 'NoneType' object has no attribute 'ping'
2020-10-29 12:23:40 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback WebSocketResponse._send_heartbeat()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_ws.py", line 102, in _send_heartbeat
    self._loop.create_task(self._writer.ping())  # type: ignore
AttributeError: 'NoneType' object has no attribute 'ping'
2020-10-29 12:27:13 WARNING (Thread-5) [homeassistant.components.mqtt] Disconnected from MQTT server core-mosquitto:1883 (1)
2020-10-29 12:27:13 WARNING (Thread-5) [homeassistant.components.mqtt] Disconnected from MQTT server core-mosquitto:1883 (1)
2020-10-29 12:29:29 WARNING (MainThread) [homeassistant.helpers.entity] Updating state for binary_sensor.remote_ui (<class 'homeassistant.components.cloud.binary_sensor.CloudRemoteBinary'>) took 12.750 seconds. Please create a bug report at https://github.com/home-assistant/core/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+cloud%22
2020-10-29 12:29:42 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer
2020-10-29 12:32:04 WARNING (Thread-5) [homeassistant.components.mqtt] Disconnected from MQTT server core-mosquitto:1883 (1)
2020-10-29 12:34:39 WARNING (MainThread) [homeassistant.components.sensor] Updating snmp sensor took longer than the scheduled update interval 0:00:10
2020-10-29 12:34:39 WARNING (MainThread) [homeassistant.components.zone] Updating zone zone took longer than the scheduled update interval 0:00:15
2020-10-29 12:34:40 WARNING (MainThread) [homeassistant.components.sensor] Updating time_date sensor took longer than the scheduled update interval 0:00:30
2020-10-29 12:34:40 WARNING (MainThread) [homeassistant.components.sensor] Updating uptime sensor took longer than the scheduled update interval 0:00:30
2020-10-29 12:34:40 WARNING (MainThread) [homeassistant.components.sensor] Updating systemmonitor sensor took longer than the scheduled update interval 0:00:30
2020-10-29 12:34:40 WARNING (MainThread) [homeassistant.components.binary_sensor] Updating ping binary_sensor took longer than the scheduled update interval 0:00:30
2020-10-29 12:34:40 WARNING (MainThread) [homeassistant.components.weather] Updating darksky weather took longer than the scheduled update interval 0:00:30
2020-10-29 12:34:40 WARNING (MainThread) [homeassistant.components.binary_sensor] Updating ping binary_sensor took longer than the scheduled update interval 0:01:00
2020-10-29 12:34:40 WARNING (MainThread) [aiohttp.websocket] websocket connection is closing.
2020-10-29 12:34:40 ERROR (MainThread) [custom_components.solarenergy.bridge] Timeout occurred while connecting to SolarEnergy inverter. Maybe it's on standby.
2020-10-29 12:34:40 ERROR (MainThread) [custom_components.solarenergy] Error fetching SolarEnergy Inverter Update data: 
2020-10-29 12:34:41 WARNING (MainThread) [homeassistant.components.sensor] Updating snmp sensor took longer than the scheduled update interval 0:00:10
2020-10-29 12:34:45 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8d': Unknown Error
2020-10-29 12:34:45 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8b': Unknown Error
2020-10-29 12:35:37 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback WebSocketResponse._send_heartbeat()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_ws.py", line 102, in _send_heartbeat
    self._loop.create_task(self._writer.ping())  # type: ignore
AttributeError: 'NoneType' object has no attribute 'ping'
2020-10-29 12:35:54 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8d': Unknown Error
2020-10-29 12:36:02 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8b': Unknown Error
2020-10-29 12:36:08 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8d': Unknown Error
2020-10-29 12:36:15 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8d': Unknown Error
2020-10-29 12:36:15 WARNING (MainThread) [homeassistant.components.onvif] Failed to restart ONVIF PullPoint subscription for '24:fd:0d:69:ed:8d'. Retrying...
2020-10-29 12:36:20 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8b': Unknown Error
2020-10-29 12:37:26 WARNING (MainThread) [homeassistant.components.onvif] Failed to fetch ONVIF PullPoint subscription messages for '24:fd:0d:69:ed:8b': Unknown Error
2020-10-29 12:37:35 ERROR (SyncWorker_21) [homeassistant] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aioesphomeapi/connection.py", line 271, in _recv
    ret = await self._socket_reader.readexactly(amount)
  File "/usr/local/lib/python3.8/asyncio/streams.py", line 723, in readexactly
    await self._wait_for_data('readexactly')
  File "/usr/local/lib/python3.8/asyncio/streams.py", line 517, in _wait_for_data
    await self._waiter
  File "/usr/local/lib/python3.8/asyncio/selector_events.py", line 910, in write
    n = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
2020-10-29 12:40:19 WARNING (Thread-5) [homeassistant.components.mqtt] Disconnected from MQTT server core-mosquitto:1883 (1)
hallenmaia commented 3 years ago

Each time I reload an ONVIF camera through the integration page, the memory usage also goes up and never goes down again.

I also noticed that every time PullPoint enrollment fails, memory usage goes up. If several failures occur in a row, the memory percentage gradually starts to rise until the entire system becomes unavailable.

mrblur commented 3 years ago

Same on my install, tried to memory profile it, got two flat .hyp profiles (too short time for profiling?), but found megabytes of memory being utilized for zeep.xsd.*

Maybe that's "normal", but why schema is kept in memory after initialization of devices? Common for our issues is that our cameras does not support PullPoint subscriptions (mine are Imou), maybe there's the problem?

hunterjm commented 3 years ago

@mrblur - The WSDL definitions have to be kept in memory as they are used when calling additional services like snapshot, events, stream, etc.

@hallenmaia - Thanks for the additional info. I think the change to how we subscribe to events might be the culprit.

hunterjm commented 3 years ago

Who has an advanced home network setup that can give me temporary VPN access to their ONVIF cameras for testing purposes? It's really hard to find the bug when unable to reproduce. Feel free to ping me on Discord or the community forums (same username everywhere).

hallenmaia commented 3 years ago

@hunterjm I!

AndrewGWalsh commented 3 years ago

I have the files in a subdirectories of custom_components, what should I do next?

kenlasko commented 3 years ago

@hunterjm if you need VPN access to an ONVIF camera for testing, I'm happy to set something up if you still need it. I've encountered the ONVIF memory leak (came here from #42631). Incidentally, my ONVIF camera hasn't worked in HA for several versions. Its still working.

hunterjm commented 3 years ago

I was able to get access to some cameras that are causing the issue, but still unable to reproduce on my dev machine. Is everybody running on a Raspberry Pi or ARM device who has encountered this so far?

kenlasko commented 3 years ago

Yes, I'm running on RPi4 via Docker.

McGiverGim commented 3 years ago

Me too, raspberry pi 4 with HassOS 5.4 64 bit (booting from ssd if it matters).

hallenmaia commented 3 years ago

RP3b+, HassOS 4.15

gimi87 commented 3 years ago

Minor update from my side... quick history graph (from 0.117.1): ram_issue_4

  1. Actual memory leak
  2. Reloaded two ONVIF integrations (2 cameras)
  3. Restarted HA and reloaded two cameras
  4. Restarted HA and reloaded only 1st camera
  5. Restarted HA and reloaded only 2nd camera

On the 1st camera I had this window: ram_issue_5

On the 2nd camera I had this window: ram_issue_6

Maybe it will help you somehow to fix this issue. Fingers crossed 😃

AndrewGWalsh commented 3 years ago

RPi4, Raspberry Pi OS, Docker.

McGiverGim commented 3 years ago

In my case it has not been fixed in 117.2. Maybe my case is another different issue? Others can confirm if fixed?

balloob commented 3 years ago

@McGiverGim in that case please open a new issue