home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.1k stars 31.1k forks source link

2024.7.1 - Core crashes nearly every day #121475

Closed pantherale0 closed 4 months ago

pantherale0 commented 4 months ago

The problem

It appears that core is crashing nearly everyday since 2024.7.x. I'm not sure if its due to a custom integration, or if something else is going on. But after around 24-36 hours of up time, Linux's OOM killer kills python3 which of course takes Home Assistant offline entirety.

I've check the logs, but can't see any integrations throwing any errors (apart from what I would expect). I can't upload the logs here as they are too large (I have one from a couple of days ago 10gb in size, another from today 1.5gb in size) but happy to send to someone via other means. I have included a snippet of the logs below around the time core crashes.

The line ERROR (MainThread) [homeassistant] Error doing job: Exception in callback _SelectorSocketTransport._read_ready() (None) repeats so much it accounts for a good 95% of the log files. And usually < 10ms apart. Almost like something is in a infinite loop.

HAOS on Proxmox 8.2.2. To get HA back I need to perform a hard reset (shutdown doesn't work, cli also is not responsive). Screenshot of stats below

image

comparison with monthly stats:

image

What version of Home Assistant Core has the issue?

core-2024.7.1

What was the last working version of Home Assistant Core?

core-2024.6.2

What type of installation are you running?

Home Assistant OS

Integration causing the issue

No response

Link to integration documentation on our website

No response

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-07-07 20:48:47.918 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:48:49.030 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:48:54.094 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:49:33.796 WARNING (SyncWorker_1) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:50:08.306 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:50:10.349 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:50:15.309 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:50:33.797 WARNING (SyncWorker_40) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:51:28.476 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:51:31.817 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:51:33.798 WARNING (SyncWorker_54) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:51:36.780 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:52:33.799 WARNING (SyncWorker_15) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:52:48.798 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:52:53.200 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:52:57.855 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:53:33.800 WARNING (SyncWorker_28) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:54:08.964 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:54:14.521 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:54:19.130 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:54:33.800 WARNING (SyncWorker_1) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:55:29.337 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:55:33.802 WARNING (SyncWorker_36) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:55:35.607 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:55:40.593 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:55:51.007 ERROR (MainThread) [pyanglianwater.api] >> Error sending request get_usage_details to Anglian Water (401) - {"Message":"Unauthorized"}
2024-07-07 20:55:59.921 ERROR (MainThread) [homeassistant.components.apcupsd.coordinator] Timeout fetching apcupsd data
2024-07-07 20:56:33.803 WARNING (SyncWorker_17) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:56:49.834 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:56:56.948 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:57:02.016 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:57:33.803 WARNING (SyncWorker_55) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:58:09.951 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.115:5555.  ConnectionRefusedError: Connect call failed ('10.10.50.115', 5555)
2024-07-07 20:58:18.183 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.33:5555.  TcpTimeoutException: Connecting to 10.10.50.33:5555 timed out (1.0 seconds)
2024-07-07 20:58:23.434 WARNING (MainThread) [androidtv.adb_manager.adb_manager_async] Couldn't connect to 10.10.50.56:5555.  TcpTimeoutException: Connecting to 10.10.50.56:5555 timed out (1.0 seconds)
2024-07-07 20:58:33.804 WARNING (SyncWorker_33) [custom_components.truenas.truenas_api] TrueNAS 192.168.1.200 unable to fetch data "system/info" (no_response)
2024-07-07 20:58:34.706 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'') while processing template 'Template<template=({{ (state_attr('media_player.emby_living_room_tv', 'media_position')/state_attr('media_player.emby_living_room_tv', 'media_duration')) * 100 }}) renders=7094>' for attribute '_attr_native_value' in entity 'sensor.emby_living_room_tv_progress'
2024-07-07 20:58:34.707 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'') while processing template 'Template<template=({{ ((as_timestamp(now())) + (state_attr('media_player.emby_living_room_tv', 'media_duration')-state_attr('media_player.emby_living_room_tv', 'media_position'))) | timestamp_custom('%H:%M') }}) renders=7420>' for attribute '_attr_native_value' in entity 'sensor.emby_living_room_tv_media_end'
2024-07-07 20:59:21.372 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback _SelectorSocketTransport._read_ready() (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 960, in _read_ready
    self._read_ready_cb()
TypeError: 'NoneType' object is not callable
2024-07-07 20:59:21.373 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback _SelectorSocketTransport._read_ready() (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 960, in _read_ready
    self._read_ready_cb()
TypeError: 'NoneType' object is not callable
2024-07-07 20:59:21.374 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback _SelectorSocketTransport._read_ready() (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 960, in _read_ready
    self._read_ready_cb()
TypeError: 'NoneType' object is not callable
2024-07-07 20:59:21.374 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback _SelectorSocketTransport._read_ready() (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 960, in _read_ready
    self._read_ready_cb()
TypeError: 'NoneType' object is not callable
2024-07-07 20:59:21.374 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback _SelectorSocketTransport._read_ready() (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 960, in _read_ready
    self._read_ready_cb()
TypeError: 'NoneType' object is not callable

Additional information

No response

pantherale0 commented 4 months ago

Closing this - not sure what happened, but restoring back to 2024.6.4 also had the same problems so rebuilt the machine it was running on and restored from a backup.

francescopeloi commented 4 months ago

I am experiencing something similar since a few days.

I get billions of

2024-07-17 15:30:44.545 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback None() (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
TypeError: 'NoneType' object is not callable

until the home-assistant.log file fills the hard drive and everything crashes. I cannot find the root cause, it seems to start from nothing, there's no useful log message before the first occurrence of this.

It's an HA instance only for camera control.

cdoepmann commented 4 months ago

I was experiencing the very same issue, very annoying. I then removed everything I didn't necessarily need from my installation (specifically, HACS and AppDeamon) and the issue hasn't occurred for a few days now. I thus suspect HACS or AppDeamon to be the root cause. Do you use these?

francescopeloi commented 4 months ago

My investigation points to Frigate and it's UI component, both coming through HACS. But I haven't found the exact root cause yet.

cdoepmann commented 4 months ago

I'm not using Frigate though, so maybe it's an issue with HACS itself...

pantherale0 commented 4 months ago

If it was a problem with frigate or hacs I would expect that I had the same problem when I rebuilt and restored from a backup. Or there would be more of these cases reported.

francescopeloi commented 4 months ago

it's quite clear it's coming from Frigate my end, we might have different issues.