home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.65k stars 30.41k forks source link

Zigbee ZHA Error: ApplicationController is not running since update to 2023.11 #103253

Closed Ragetti13 closed 7 months ago

Ragetti13 commented 11 months ago

The problem

I am receiving an error with Zigbee devices since updating to 2023.11. Devices that are part of automations are now flaky and sometimes do not response and when making service calls via the dashboard or device card i receive this error message: "Failed to call service light/turn_on. Failed to send request: ApplicationController is not running"

Restarting and/or rebooting my Home Assistant Yellow does not resolve the issues, nor does disabling/re-enabling or reloading ZHA

No new devices introduced or any changes to my Home Assistant instance other than updating to 2023.11

What version of Home Assistant Core has the issue?

core-2023.11.0

What was the last working version of Home Assistant Core?

core-2023.10.4

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

home-assistant_zha_2023-11-02T15-18-07.706Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Logger: homeassistant.components.websocket_api.http.connection
Source: components/websocket_api/commands.py:230
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 15:26:04 (7 occurrences)
Last logged: 15:31:44

[547974951488] Failed to send request: ApplicationController is not running
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrap_zigpy_exceptions
    yield
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 84, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 138, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 297, in request
    await self._application.request(
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 768, in request
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 752, in send_packet
    raise ControllerError("ApplicationController is not running")
bellows.exception.ControllerError: ApplicationController is not running

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 230, in handle_call_service
    await hass.services.async_call(
  File "/usr/src/homeassistant/homeassistant/core.py", line 2035, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2072, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 235, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 876, in entity_service_call
    response_data = await _handle_entity_call(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 591, in async_handle_light_off_service
    await light.async_turn_off(**filter_turn_off_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/group/light.py", line 189, in async_turn_off
    await self.hass.services.async_call(
  File "/usr/src/homeassistant/homeassistant/core.py", line 2035, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2072, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 235, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 904, in entity_service_call
    task.result()  # pop exception if have
    ^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 1219, in async_request_call
    return await coro
           ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 591, in async_handle_light_off_service
    await light.async_turn_off(**filter_turn_off_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 471, in async_turn_off
    result = await self._on_off_cluster_handler.off()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running

Additional information

No response

home-assistant[bot] commented 11 months ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

puddly commented 11 months ago

You seem to be using multi-PAN. Do you have any Thread devices? Restart the Multiprotocol addon, it should bring your network back up.

Ragetti13 commented 11 months ago

I am and do have some thread devices connected. All worked well prior to the latest release but everything has lost reliability and this error still occurs even after restarting the addon.Any other suggestions?On 2 Nov 2023, at 17:18, puddly @.***> wrote: You seem to be using multi-PAN. Do you have any Thread devices? Restart the Multiprotocol addon, it should bring your network back up.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

DominiqueGEORGES-SBDG commented 11 months ago

Hi.

since that upgrade, I am also in the problems. I don't know how to restore the situation from before that upgrade :'(

Nothing is working anymore on my HA.

Is there a "for dummies" explanation of what and how to do that ?

Thanks, Dom

Ragetti13 commented 11 months ago

I have since tried restarting the add ons for multi-PAN as well as the Matter Server; rebooting the Yellow but it is still consistently broken.

Logs from multi-pan reboot attached:

`[07:34:02] INFO: The otbr-web is disabled. s6-rc: info: service mdns: starting s6-rc: info: service s6rc-oneshot-runner: starting s6-rc: info: service mdns successfully started s6-rc: info: service s6rc-oneshot-runner successfully started s6-rc: info: service fix-attrs: starting s6-rc: info: service fix-attrs successfully started s6-rc: info: service legacy-cont-init: starting cont-init: info: running /etc/cont-init.d/check-cpcd-shm.sh [07:34:03] INFO: Starting mDNS Responder... Default: mDNSResponder (Engineering Build) (Aug 23 2023 18:31:24) starting cont-init: info: /etc/cont-init.d/check-cpcd-shm.sh exited 0 s6-rc: info: service legacy-cont-init successfully started s6-rc: info: service banner: starting


Add-on: Silicon Labs Multiprotocol Zigbee and OpenThread multiprotocol add-on

Add-on version: 2.3.2 You are running the latest version of this add-on. System: Home Assistant OS 11.1 (aarch64 / yellow) Home Assistant Core: 2023.11.0 Home Assistant Supervisor: 2023.10.1

Please, share the above information when looking for help or support in, e.g., GitHub, forums or the Discord chat.

s6-rc: info: service banner successfully started s6-rc: info: service universal-silabs-flasher: starting [07:34:05] INFO: Detected Home Assistant Yellow [07:34:06] INFO: Starting universal-silabs-flasher with /dev/ttyAMA1 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flash[181] INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.3.1', ezsp_version=None, ot_rcp_version=None, fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>, baudrate=460800) 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flasher[181] INFO Triggering Yellow bootloader 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flasher[181] INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flasher[181] INFO Launched application from bootloader 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flasher[181] INFO Detected bootloader version '2.0.1' 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flasher[181] INFO Probing ApplicationType.CPC at 460800 baud 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flasher[181] INFO Detected ApplicationType.CPC, version '4.3.1' at 460800 baudrate (bootloader baudrate 115200) 2023-11-03 07:34:07 homeassistant universal_silabs_flasher.flash[181] INFO Firmware version '4.3.1' is flashed, not re-installing s6-rc: info: service universal-silabs-flasher successfully started s6-rc: info: service cpcd-config: starting [07:34:08] INFO: Using known baudrate of 460800 for cpcd! [07:34:09] INFO: Generating cpcd configuration. s6-rc: info: service cpcd-config successfully started s6-rc: info: service cpcd: starting [07:34:10] INFO: Starting cpcd... WARNING in function 'main' in file /usr/src/cpc-daemon/main.c at line #186 : Running CPCd as 'root' is not recommended. Proceed at your own risk. s6-rc: info: service cpcd successfully started s6-rc: info: service zigbeed: starting s6-rc: info: service otbr-agent: starting s6-rc: info: service zigbeed successfully started [07:34:11] INFO: Starting zigbeed... [07:34:10:170444] Info : [CPCd v4.3.1.0] [Library API v3] [RCP Protocol v4] [07:34:10:170595] Info : Git commit: 133b29678b3d0bc7578e098d2f46b4d5bcd2ebb4 / branch: [07:34:10:170599] Info : Sources hash: ff8300587e7e4ab1def7a89a272c0baef32f9eb3bff9b0ba06b94e655d652367 [07:34:10:170606] WARNING : In function 'main' in file /usr/src/cpc-daemon/main.c at line #186 : Running CPCd as 'root' is not recommended. Proceed at your own risk. [07:34:10:170648] Info : Reading cli arguments [07:34:10:170656] Info : /usr/local/bin/cpcd [07:34:10:171291] Info : Reading configuration [07:34:10:171308] Info : file_path = /usr/local/etc/cpcd.conf [07:34:10:171310] Info : instance_name = cpcd_0 [07:34:10:171313] Info : socket_folder = /dev/shm [07:34:10:171316] Info : operation_mode = MODE_NORMAL [07:34:10:171318] Info : use_encryption = false [07:34:10:171320] Info : binding_key_file = /etc/binding-key.key [07:34:10:171323] Info : stdout_tracing = false [07:34:10:171325] Info : file_tracing = false [07:34:10:171327] Info : lttng_tracing = false [07:34:10:171329] Info : enable_frame_trace = false [07:34:10:171331] Info : traces_folder = /dev/shm/cpcd-traces [07:34:10:171333] Info : bus = UART [07:34:10:171335] Info : uart_baudrate = 460800 [07:34:10:171338] Info : uart_hardflow = true [07:34:10:171340] Info : uart_file = /dev/ttyAMA1 [07:34:10:171343] Info : fu_recovery_pins_enabled = false [07:34:10:171346] Info : fu_connect_to_bootloader = false [07:34:10:171348] Info : fu_enter_bootloader = false [07:34:10:171350] Info : restart_cpcd = false [07:34:10:171352] Info : application_version_validation = false [07:34:10:171354] Info : print_secondary_versions_and_exit = false [07:34:10:171356] Info : use_noop_keep_alive = false [07:34:10:171358] Info : reset_sequence = true [07:34:10:171361] Info : stats_interval = 0 [07:34:10:171363] Info : rlimit_nofile = 2000 [07:34:10:171366] Info : ENCRYPTION IS DISABLED [07:34:10:171368] Info : Starting daemon in normal mode [07:34:10:182861] Info : Connecting to Secondary... [07:34:10:249694] Info : RX capability is 256 bytes [07:34:10:249742] Info : Connected to Secondary [07:34:10:250990] Info : Secondary Protocol v4 [07:34:10:253811] Info : Secondary CPC v4.3.1 [07:34:10:255160] Info : Secondary bus bitrate is 460800 [07:34:10:258047] Info : Secondary APP vUNDEFINED [07:34:10:258405] Info : Daemon startup was successful. Waiting for client connections [07:34:11] INFO: Setup OTBR firewall... [07:34:11] INFO: Starting otbr-agent... otbr-agent[297]: [NOTE]-AGENT---: Running 0.3.0 otbr-agent[297]: [NOTE]-AGENT---: Thread version: 1.3.0 otbr-agent[297]: [NOTE]-AGENT---: Thread interface: wpan0 otbr-agent[297]: [NOTE]-AGENT---: Radio URL: spinel+cpc://cpcd_0?iid=2&iid-list=0 otbr-agent[297]: [NOTE]-ILS-----: Infra link selected: end0 otbr-agent[297]: 50d.01:45:38.997 [C] Platform------: mCpcBusSpeed = 115200 [07:34:11:900650] Info : New client connection using library v4.3.1.0 [07:34:11:902827] Info : Opened connection socket for ep#12 [07:34:11:903083] Info : Endpoint socket #12: Client connected. 1 connections [07:34:12:073141] Info : New client connection using library v4.3.1.0 [07:34:12:075326] Info : Endpoint socket #12: Client connected. 2 connections otbr-agent[297]: 00:00:00.041 [N] RoutingManager: BR ULA prefix: fd62:fa42:eef2::/48 (loaded) otbr-agent[297]: 00:00:00.042 [N] RoutingManager: Local on-link prefix: fdf6:2410:6021:ac2d::/64 otbr-agent[297]: 00:00:00.111 [N] Mle-----------: Role disabled -> detached otbr-agent[297]: 00:00:00.116 [N] Platform------: [netif] Changing interface state to up. s6-rc: info: service otbr-agent successfully started s6-rc: info: service otbr-agent-rest-discovery: starting [07:34:14] INFO: Successfully sent discovery information to Home Assistant. s6-rc: info: service otbr-agent-rest-discovery successfully started s6-rc: info: service legacy-services: starting s6-rc: info: service legacy-services successfully started Listening on port 9999 for connection... Accepting connection. Accepted connection 7. otbr-agent[297]: 00:00:26.300 [N] Mle-----------: RLOC16 7800 -> fffe otbr-agent[297]: 00:00:26.303 [W] Platform------: [netif] Failed to process request#5: Unknown error -95 otbr-agent[297]: 00:00:26.442 [N] Mle-----------: Attach attempt 1, AnyPartition reattaching with Active Dataset otbr-agent[297]: 00:00:32.943 [N] RouterTable---: Allocate router id 30 otbr-agent[297]: 00:00:32.943 [N] Mle-----------: RLOC16 fffe -> 7800 otbr-agent[297]: 00:00:32.946 [N] Mle-----------: Role detached -> leader otbr-agent[297]: 00:00:32.946 [N] Mle-----------: Partition ID 0x6dec65ea otbr-agent[297]: 00:00:32.986 [W] Platform------: [netif] Failed to process request#6: Unknown error -17 otbr-agent[297]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary! [07:35:41] INFO: zigbeed ended with exit code 256 (signal 13)... [07:35:41:086209] Info : Endpoint socket #12: Client disconnected. 1 connections [07:35:41:086306] Info : Client disconnected [07:35:42] INFO: Starting zigbeed... [07:35:43:962363] Info : New client connection using library v4.3.1.0 [07:35:43:964856] Info : Endpoint socket #12: Client connected. 2 connections Listening on port 9999 for connection... Accepting connection. Accepted connection 7. Listening on port 9999 for connection... Accepting connection. Accepted connection 7. [07:36:21] INFO: zigbeed ended with exit code 256 (signal 13)... [07:36:21:087668] Info : Endpoint socket #12: Client disconnected. 1 connections [07:36:21:087744] Info : Client disconnected [07:36:22] INFO: Starting zigbeed... [07:36:23:299396] Info : New client connection using library v4.3.1.0 [07:36:23:301350] Info : Endpoint socket #12: Client connected. 2 connections Listening on port 9999 for connection... Accepting connection. Accepted connection 7. Socket connection has been closed, restarting... Listening on port 9999 for connection... Accepting connection. Accepted connection 7. Restarting [07:37:22:604248] Info : Endpoint socket #12: Client disconnected. 1 connections [07:37:22:604456] Info : Client disconnected [07:37:23:609536] Info : New client connection using library v4.3.1.0 [07:37:23:611489] Info : Endpoint socket #12: Client connected. 2 connections Reusing socket from previous instance. [07:37:41:139214] Info : Endpoint socket #12: Client disconnected. 1 connections [07:37:41:139299] Info : Client disconnected [07:37:41] INFO: zigbeed ended with exit code 256 (signal 13)... [07:37:42] INFO: Starting zigbeed... [07:37:43:341584] Info : New client connection using library v4.3.1.0 [07:37:43:343364] Info : Endpoint socket #12: Client connected. 2 connections Listening on port 9999 for connection... Accepting connection. Accepted connection 7.`

optronomega commented 11 months ago

+1 to Zigbee ZHA issues after upgrade. Devices started failing about a day after upgrade, when i reloaded ZHA it said my configuration was corrupted and i had to load a backup. Doing that caused all devices to go offline. Tried a restore of HA to before i updated to 2023.11.0, Zigbee still didn't work. Had to re-pair every single zigbee device. After running all over the house to find all my devices, back to working and on 2023.10.5. Hopefully this can be fixed and a fix to 2023.11.0 can be pushed out.

DominiqueGEORGES-SBDG commented 11 months ago

It took me more than 3 hours to find this bluddy human unreadbla identityIds and deviceIds of all my Zigbee environment.

I can't understand HA push such a beta version as a stable version.

Modifying all my yaml files have certainly introduced other corruptions and issues, in addition of these ones.

Just a real nightmare.
All of this without any HA team alerts or warnings.

Dom

DominiqueGEORGES-SBDG commented 11 months ago

Groups are not working neither. No way to find their "human unreadable id" :'(

Ragetti13 commented 11 months ago

I have since discovered that a full reboot of my HA Yellow does resolve the issue for a few hours but the state of the Zigbee/Matter network does regress and the issues resurfaces.

The restart of the Multi-Pan Add-on does not have the same effect of temporarily stabilising the system.

Also - the issues is still present in the latest core update to 2023.11.1 and Matter server update to 4.10.2

DominiqueGEORGES-SBDG commented 11 months ago

Why doesn't HA team communicate about this real issue ?

Everyone tries to solve the problem by lot of manipulations with, at long term, a total loss of their installation and customizations. Dom

ralphbk commented 11 months ago

Seeing the same problems here., Devices becoming useless all over the house. Unable to re-register them. Is there no workaround available?

puddly commented 11 months ago

Please upload a ZHA debug log and the diagnostics JSON for the ZHA integration. I feel there are multiple unrelated issues being discussed here with the only common element being "devices don't work".

DominiqueGEORGES-SBDG commented 11 months ago

Maybe one solution/option would be to add a user interface that allows to eaily and clearly associate and manage human readable ids and their UUID.

The scripts using the Human Readable ids would not be impacted anymore by such an issue.

ralphbk commented 11 months ago

Please upload a ZHA debug log and the diagnostics JSON for the ZHA integration. I feel there are multiple unrelated issues being discussed here with the only common element being "devices don't work".

Here's the logs you requested. It should contain the successful re-registration of a couple of Tradfri shortcut buttons, and then in a new search, the failed attempt to re-register a Tradfri remote control.

HA_Debug_Log.txt config_entry-zha-5d52aeaf90799457191420a4e85560a3.json.txt

16-9 commented 11 months ago

Same here and adding my 2 cents. Same config, situation as the others, Everything works until it suddenly stops, ie no device is reachable any longer. I end up stopping the add-on, turning off start on boot and watchdog, shutdown my HASSIO Raspberry, go downstairs unplug SkyConnect, reconnect, start the add-on, and then reload ZHA. That tends to fix it the mess, yet it is a timebomb and I don't know whether it is functional for an hour or a day.

...forgot to add a step: restore a prior backup. I think next time it happens I will restore back to 2023.11.0

ralphbk commented 11 months ago

Similar story here. Switched to Zigbee channel 25 (to avoid wifi interference.) Removed my 3 Tradfri Signal Repeaters (where I was never able to add devices.) Powered the HA off/on (just in case.) And things seem to have settled down again now. Until next time.

Not sure which of those actions might have had an effect, for how long. Not sure why I was asked to post logs above, since no-one has commented on them.

Some googling has suggested replacing the SkyConnect dongle with a Sonoff Zigbee 3.0 Dongle P might be the way to ensure better stability. Any comments on this?

puddly commented 11 months ago

Not sure why I was asked to post logs above, since no-one has commented on them.

Sorry, I missed your comment.

From the logs, you can see that your Zigbee channel at the time (15) was about 86% congested. Your new channel (25) is 12%. That explains why you are seeing a difference. Channel 20 is only 5% congested so I would suggest switching to that one instead.

ralphbk commented 11 months ago

Strangely, since moving from Zigbee channel 15 to 25, channel 15 has become even more congested (due to wifi?), and 25 less so.

Before and after screenshot: image

Channel 25 now also looks better than 20, so I think I'll leave it be, for now.

martisys commented 11 months ago

I had the same problem. It helped me reconfigure the radio connection. All the devices originally connected remain and everything is working properly so far.

issue-triage-workflows[bot] commented 8 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.