home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.38k stars 30.64k forks source link

ZHA very unreliable: AssertionError and later crash of application controller #100657

Closed aebgit closed 9 months ago

aebgit commented 1 year ago

The problem

Since 2023.9.0 I observe plenty of Zigbee issues. I was on Raspi with Skyconnect (Multiprotocol mode) before, and have now migrated to Yellow and the problem persists (restored my old backup).

Zigbee was very reliable until core-2023.8.4 for me and in fact I rolled back to it when I first tried core-2023.9.0 and .1. and had serious problems. On the raspi with 2023.9.0 ZHA crashed already minutes to 2h after system start. Now on Yellow the system ran for 3 days but ZHA crashed just now.

I get plenty of Assertion errors (see 1. log extract below). This ist then followed by a non recoverable crash of the application controller (see 2. and 3. log extract below). At this stage only a complete reboot helps.

I do not understand well enough if the logged issues are indeed connected. Please let me know if I should provide further logs.

Thanks! Andreas

What version of Home Assistant Core has the issue?

core-2023.9.2

What was the last working version of Home Assistant Core?

core-2023.8.4

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Logger: zigpy.zcl
Source: runner.py:186
First occurred: September 17, 2023 at 17:40:28 (418 occurrences)
Last logged: 20:53:05

[0x5288:1:0x0020] AssertionError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 292, in request with self._pending.new(sequence) as req: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 291, in new raise ControllerException(f"duplicate {sequence} TSN") from AssertionError zigpy.exceptions.ControllerException: duplicate 90 TSN The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 519, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: duplicate 90 TSN
[0x5288:1:0x0020] AssertionError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 292, in request with self._pending.new(sequence) as req: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 291, in new raise ControllerException(f"duplicate {sequence} TSN") from AssertionError zigpy.exceptions.ControllerException: duplicate 82 TSN The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 519, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: duplicate 82 TSN
[0x5288:1:0x0020] AssertionError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 292, in request with self._pending.new(sequence) as req: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 291, in new raise ControllerException(f"duplicate {sequence} TSN") from AssertionError zigpy.exceptions.ControllerException: duplicate 95 TSN The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 519, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: duplicate 95 TSN
[0x5288:1:0x0020] AssertionError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 292, in request with self._pending.new(sequence) as req: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 291, in new raise ControllerException(f"duplicate {sequence} TSN") from AssertionError zigpy.exceptions.ControllerException: duplicate 3 TSN The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 519, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: duplicate 3 TSN
[0x5288:1:0x0020] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 68, in command return await future ^^^^^^^^^^^^ asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 828, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 798, in send_packet await self._ezsp.setExtendedTimeout(device.ieee, True) File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 67, in command async with asyncio_timeout(EZSP_CMD_TIMEOUT): File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in __aexit__ raise TimeoutError from exc_val TimeoutError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 519, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 66, in wrap_zigpy_exceptions raise HomeAssistantError( homeassistant.exceptions.HomeAssistantError: Failed to send request: device did not respond

Logger: homeassistant.core
Source: components/zha/core/cluster_handlers/__init__.py:75
First occurred: 21:00:20 (5 occurrences)
Last logged: 21:00:20

Error executing service: <ServiceCall light.turn_off (c:01HASZDWZG7VJWCD7XHH0HN2SG): entity_id=['light.arbeitszimmer_dude_light'], params=>
Error executing service: <ServiceCall light.turn_off (c:01HASZDWZHKXPBHNW04S0V3516): entity_id=['light.wohnzimmer_esstisch_light'], params=>
Error executing service: <ServiceCall light.turn_off (c:01HASZDWZJ9H01C317EQ2E1VRR): entity_id=['light.arbeitszimmer_decke_light'], params=>
Error executing service: <ServiceCall light.turn_off (c:01HASZDWZK7VVN2D5DAY3T2WQG): entity_id=['light.schlafzimmer_fenster_light'], params=>
Error executing service: <ServiceCall light.turn_off (c:01HASZDWZKF14XNRP5QBXQCHM9): entity_id=['light.wohnzimmer_stecker_balkontur_switch'], params=>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrap_zigpy_exceptions
    yield
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 84, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 132, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 293, in request
    await self._application.request(
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 828, in request
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 751, in send_packet
    raise ControllerError("ApplicationController is not running")
bellows.exception.ControllerError: ApplicationController is not running

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/core.py", line 1985, in _run_service_call_catch_exceptions
    await coro_or_task
  File "/usr/src/homeassistant/homeassistant/core.py", line 2006, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 235, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 870, in entity_service_call
    response_data = await _handle_entity_call(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 942, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 591, in async_handle_light_off_service
    await light.async_turn_off(**filter_turn_off_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 471, in async_turn_off
    result = await self._on_off_cluster_handler.off()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running

Logger: bellows.zigbee.application
Source: /usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py:685
First occurred: September 17, 2023 at 17:40:16 (191 occurrences)
Last logged: 21:11:17

ControllerApplication reset unsuccessful: ConnectionRefusedError(111, "Connect call failed ('172.30.32.1', 9999)")
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 682, in _reset_controller_loop
    await self._reset_controller()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 703, in _reset_controller
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 138, in connect
    await ezsp.connect(use_thread=self.config[CONF_USE_THREAD])
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 179, in connect
    self._gw = await bellows.uart.connect(self._config, self, use_thread=use_thread)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 414, in connect
    protocol, _ = await _connect(config, application)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 385, in _connect
    transport, protocol = await zigpy.serial.create_serial_connection(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/serial.py", line 36, in create_serial_connection
    transport, protocol = await loop.create_connection(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1085, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 628, in sock_connect
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 668, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)

Additional information

No response

home-assistant[bot] commented 1 year ago

Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

puddly commented 1 year ago

2023.9.0 and 2023.9.1 had some startup issues with multi-PAN but 2023.9.2 should be as stable as 2023.8.x.

Could you collect a complete ZHA debug log of both problems? https://www.home-assistant.io/integrations/zha/#debug-logging.

If you find that downgrading to 2023.8.x helps, can you record a debug log with that version as well?

You may have to ZIP the logs, as they get somewhat large.

aebgit commented 1 year ago

in progress, currently waiting for the crash to re-appear

Slurpgeit commented 1 year ago

I have this problem as well, indeed started after the update to 2023.9. I have problems with zigbee devices simply not responding any more, and the UI giving me "NETWORK BUSY" errors. I will also try to collect the requested logs.

Here is one of those NETWORK BUSY errors:

Logger: zigpy.topology
Source: /usr/local/lib/python3.11/site-packages/zigpy/topology.py:84
First occurred: 4:08:20 PM (2 occurrences)
Last logged: 8:22:06 PM

Topology scan failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 78, in _scan_loop
    await self.scan()
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 96, in scan
    await self._scan_task
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 221, in _scan
    await self._find_unknown_devices(neighbors=self.neighbors, routes=self.routes)
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 253, in _find_unknown_devices
    await self._app._discover_unknown_device(nwk)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 940, in _discover_unknown_device
    return await zigpy.zdo.broadcast(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 519, in broadcast
    return await app.broadcast(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 916, in broadcast
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 854, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to enqueue message after 3 attempts: <EmberStatus.NETWORK_BUSY: 161>
aebgit commented 1 year ago

@puddly please find my debug-log (from boot to crash) for core-2023.9.2 here: https://www.icloud.com/iclouddrive/0b2qAbJr-aMq_EFAblKsC5F1Q#100657_core-2023.9.2_home-assistant_sanitized (251MB zipped)

Californian commented 1 year ago

I believe this may be related to my issue: https://github.com/home-assistant/core/issues/101432

aebgit commented 1 year ago

I agree. I have disabled Multiprotocol and am using since a couple of days two separate radios (one for Zigbee, one for thread) and everything seems to be way more reliable. Multiprotocol seems to be tagged as experimental for a reason.

issue-triage-workflows[bot] commented 9 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.