home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.44k stars 30.67k forks source link

Core unable to connect to supervisor #68956

Closed RudolfRendier closed 2 years ago

RudolfRendier commented 2 years ago

The problem

After some random amount of uptime my HA Core looses the ability to connect with the supervisor. Until I restart core via de HA CLI.

What version of Home Assistant Core has the issue?

core-2022.3.7

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

No response

Link to integration documentation on our website

No response

Diagnostics information

This is the System Health after a restart, because the information won't load when it happens.

System Health

version core-2022.3.7
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.9.9
os_name Linux
os_version 5.10.103
arch x86_64
timezone Europe/Amsterdam
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4853 Installed Version | 1.23.0 Stage | running Available Repositories | 1018 Downloaded Repositories | 21
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 7.5 -- | -- update_channel | stable supervisor_version | supervisor-2022.03.5 docker_version | 20.10.9 disk_total | 30.8 GB disk_used | 15.0 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Samba share (9.5.1), Terminal & SSH (9.3.0), Vaultwarden (Bitwarden) (0.15.0), AdGuard Home (4.4.5), MariaDB (2.4.0), UniFi Network Application (1.1.4), WireGuard (0.6.0), ESPHome (2022.2.6), deCONZ (6.12.0), Spotify Connect (0.11.0), Duck DNS (1.14.0), Nginx Proxy Manager (0.11.0), Mosquitto broker (6.0.1), Node-RED (11.1.0), Studio Code Server (4.2.0)
Lovelace dashboards | 5 -- | -- resources | 8 views | 16 mode | storage
Spotify api_endpoint_reachable | ok -- | --

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2022-03-31 04:53:54 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-03-31 04:53:54 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_spotify/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_deconz/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_wireguard/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/5c53de3b_esphome/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_unifi/stats request
2022-03-31 04:59:09 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request

Additional information

The :8123/hassio-page shows the Troubleshooting-tips

Troubleshooting

  1. If you just started, make sure you have given the Supervisor enough time to start.
  2. Check the Observer
  3. Try a reboot of the host
  4. Check System Health
  5. Ask for help

The link for the observer points me to: http://homeassistant.local:4357/ But that's not the correct name, with the right name it shows:

Home Assistant observer

Supervisor: | Connected Supported: | Supported Healthy: | Healthy

:8123/config/info doesn't load the 'Health'-segment

This makes I can't load i.e. deCONZ via the sidepanel. Unable to load the panel source: /api/hassio/app/entrypoint.js.

I created an notification for when the deCONZ CPU usages goes stale (for 30mins), which is indicative of the error occurring. My zigbee lights also start failing, but weirdly enough not right away. Today I received the notification at 5:18 AM and around 20 minutes later the lights could still be turned on. Now, an hour, or maybe two later, nothing responds.

deCONZ is just an example and the most annoying case, but i.e. VS Code Server add-on has the same issue. I though for a while it was related to the purge at night, but I disabled auto_purge and rescheduled it to some other time of day, but no effect.

I had this problem on a Pi3 and it's still present after I migrated to a NUC i3 using proxmox. Suspected a flaky sdcard to be the issue, but that's not the case.

woopsicle commented 2 years ago

I have the same issue every few days. Running 'ha core rebuild' fixes the issue quickly, but still annoying to happen in the first place. All addons stop working, as do some other things such as my aircon (izone), webrtc (HACS installed) streams and turning on/off lights (which are not controlled by an addon). Sensors keep working only really. I get the same for the Observer status page. Nothing obvious in the logs, the first thing that happens is my aircon stops working (even though its still on the network etc).

A few extracts: `

2022-04-06 15:44:19 WARNING (MainThread) [pizone.discovery] Connection to controller lost: id=000025800 ip=192.168.1.106
2022-04-06 15:44:19 INFO (MainThread) [homeassistant.components.izone.climate] Controller 000025800 disconnected due to exception: 
2022-04-06 15:44:19 INFO (MainThread) [pizone.controller] Attempting to reconnect to server uid=000025800 ip=192.168.1.106
2022-04-06 15:44:21 WARNING (MainThread) [pizone.discovery] Controller reconnected: id=000025800 ip=192.168.1.106
2022-04-06 15:44:21 INFO (MainThread) [homeassistant.components.izone.climate] Reconnected controller 000025800 
2022-04-06 15:50:24 WARNING (MainThread) [pizone.discovery] Connection to controller lost: id=000025800 ip=192.168.1.106
2022-04-06 15:50:24 WARNING (MainThread) [pizone.discovery] Unable to complete <coroutine object Controller._refresh_system at 0x7f76e38732c0> due to connection error
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/pizone/controller.py", line 472, in _get_resource
    async with self._sending_lock, session.get(
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 634, in _request
    break
  File "/usr/local/lib/python3.9/site-packages/aiohttp/helpers.py", line 721, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/pizone/discovery.py", line 364, in _wrap_update
    await coro
  File "/usr/local/lib/python3.9/site-packages/pizone/controller.py", line 369, in _refresh_system
    values = await self._get_resource(
  File "/usr/local/lib/python3.9/site-packages/pizone/controller.py", line 486, in _get_resource
    raise ConnectionError("Unable to connect to the controller") from ex
ConnectionError: Unable to connect to the controller
2022-04-06 15:55:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /store request
2022-04-06 15:55:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /core/info request
2022-04-06 15:55:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /host/info request
2022-04-06 15:55:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /os/info request
2022-04-06 15:55:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /info request
2022-04-06 15:55:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /supervisor/info request
2022-04-06 15:55:37 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
RudolfRendier commented 2 years ago

It happened again last night and this automation detected and recovered it by restarting Home Assistant. (Even though the request to restart seemed to have timed out)

core is now at 2022.3.8

- id: '1645950327261'
  alias: Nofity deCONZ not running
  description: ''
  trigger:
  - platform: state
    entity_id: sensor.deconz_cpu_percent
    for:
      hours: 0
      minutes: 30
      seconds: 0
  condition: []
  action:
  - service: notify.mobile_app
    data:
      message: No change in CPU activity for deCONZ detected, restarting Home Assistant
        ...
      title: deCONZ stale
  - delay:
      hours: 0
      minutes: 1
      seconds: 0
      milliseconds: 0
  - service: homeassistant.restart
    data: {}
  mode: restart
2022-04-07 00:22:04 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-04-07 00:22:04 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-04-07 00:22:04 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-04-07 00:22:04 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-04-07 00:22:04 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_spotify/stats request
2022-04-07 00:22:04 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-04-07 00:22:04 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-04-07 00:27:15 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-04-07 00:27:15 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_spotify/stats request
2022-04-07 00:27:15 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-04-07 00:27:15 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-04-07 00:27:15 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-04-07 00:27:15 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-04-07 00:27:15 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-04-07 00:32:26 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-04-07 00:32:26 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-04-07 00:32:26 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_spotify/stats request
2022-04-07 00:32:26 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-04-07 00:32:26 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-04-07 00:32:26 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-04-07 00:32:26 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_spotify/stats request
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-04-07 00:37:37 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_deconz/stats request
2022-04-07 00:37:37 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-04-07 00:42:48 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-04-07 00:42:48 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-04-07 00:42:48 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-04-07 00:42:48 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-04-07 00:42:48 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-04-07 00:42:48 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-04-07 00:47:56 WARNING (MainThread) [homeassistant.components.mqtt] The 'birth_message' option near /config/configuration.yaml:278 is deprecated, please remove it from your configuration
2022-04-07 00:47:56 WARNING (MainThread) [homeassistant.components.mqtt] The 'broker' option near /config/configuration.yaml:278 is deprecated, please remove it from your configuration
2022-04-07 00:47:56 WARNING (MainThread) [homeassistant.components.mqtt] The 'discovery' option near /config/configuration.yaml:278 is deprecated, please remove it from your configuration
2022-04-07 00:47:56 WARNING (MainThread) [homeassistant.components.mqtt] The 'password' option near /config/configuration.yaml:278 is deprecated, please remove it from your configuration
2022-04-07 00:47:56 WARNING (MainThread) [homeassistant.components.mqtt] The 'username' option near /config/configuration.yaml:278 is deprecated, please remove it from your configuration
2022-04-07 00:47:56 WARNING (MainThread) [homeassistant.components.mqtt] The 'will_message' option near /config/configuration.yaml:278 is deprecated, please remove it from your configuration
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_spotify/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_vscode/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/5c53de3b_esphome/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_deconz/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-04-07 00:47:59 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-04-07 00:47:59 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-04-07 00:48:07 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /homeassistant/restart request
shbatm commented 2 years ago

I'm stumped on this one too. I'm having a similar issue with the same type of timeout and lost connection error floods. Running OS on a Proxmox VM and it looks to be an issue with memory (although the VM has 8GB and usually sits at about 55% usage).

[42179.657788] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=5e36dc7fde00ca61e18983cdb94bab0fa6ee2cdab6b46a7a0da681f57660b36c,mems_allowed=0,global_oom,task_memcg=/docker/4f0dbf2c20f0945f5f991e00455b5ba6a4e251b94206fd0196adcc1ca90f7cd6,task=python3,pid=10388,uid=0
[42179.657882] Out of memory: Killed process 10388 (python3) total-vm:5489864kB, anon-rss:3326120kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:9952kB oom_score_adj:0

Supervisor watchdog usually restarts HA Core when it happens. Can't pinpoint to a specific time, event, or integration causing it.

Also looks like it might be related to these:

Home Assistant Core 2022.5.5 Home Assistant Supervisor 2022.05.3 Home Assistant OS 8.1 Kernel version 5.15.41 Agent version 1.2.1 Proxmox VM

RudolfRendier commented 2 years ago

Although the symptoms are similar your issue could be unrelated. (Same for the issues you linked)

My HA-core was not restarted and continued to function (albeit somewhat limited) I doubt there were OOM-killer logs in my logs. Python3 consuming 5489864kB of memory is quite something. Perhaps there's a leak?

I'm currently still on 2022.4.7 and the issue disappeared with that major release. I was thinking of closing this issue.

RudolfRendier commented 2 years ago

Last night it happened again (after running steadily for weeks) I have been gradually updating my add-ons over the past days, deCONZ and unifi the most recent

DeCONZ cpu percentage sensor became unavailable at 1.55.32 AM (GMT+2) Health page did load on my phone this morning, showed 'TIMEOUT' for several items, i.e. HACS github.

Ingress add-on pages for deCONZ and VSCode refuse to load now: "Unable to load the panel source: /api/hassio/app/entrypoint.js."

From the HA CLI http://hassio.local:4357/

Home Assistant observer

Supervisor: | Connected -- | -- Supported: | Supported Healthy: | Healthy

I'm pretty sure the web-interface gave me a link to the observer with a docker IP that didn't work.

After restart:

System Health

version core-2022.5.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.9.9
os_name Linux
os_version 5.15.41
arch x86_64
timezone Europe/Amsterdam
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.25.5 Stage | running Available Repositories | 1071 Downloaded Repositories | 21
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 8.1 -- | -- update_channel | stable supervisor_version | supervisor-2022.05.3 docker_version | 20.10.14 disk_total | 30.8 GB disk_used | 16.8 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Samba share (9.7.0), Terminal & SSH (9.4.0), Vaultwarden (Bitwarden) (0.17.0), AdGuard Home (4.6.0), MariaDB (2.4.0), UniFi Network Application (2.3.0), WireGuard (0.7.0), ESPHome (2022.6.0), deCONZ (6.14.1), Spotify Connect (0.12.1), Duck DNS (1.15.0), Nginx Proxy Manager (0.11.0), Mosquitto broker (6.1.2), Studio Code Server (5.1.0), Check Home Assistant configuration (3.10.2), Grocy (0.18.2)
Dashboards dashboards | 5 -- | -- resources | 8 views | 16 mode | storage
Spotify api_endpoint_reachable | ok -- | --
github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.