home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
69.77k stars 28.91k forks source link

ZHA - Individual Philips Hue Lights Unresponsive/Inconsistent, except within a Zigbee Group #116104

Open nerdyninny opened 2 months ago

nerdyninny commented 2 months ago

The problem

ZHA - Individual Lights Unresponsive/Inconsistent, except within a Zigbee Group

I recently started having a lot of unresponsive/inconsistent Philips Hue lights. It all started when I migrated off the Philips Hue bridge and added around 25 devices (mostly Bulbs) to ZHA. I have a total of 95 nodes currently.

Something I noticed is that when the same unresponsive/inconsistent Philips Hue lights are controlled via a Zigbee group, they work very fast and snappy. But when I control them individually, that's when I see error messages like this one (see screenshot): Failed to call service light/turn_off. Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102<

Things I've tried:

  1. Replaced and migrated Zigbee coordinator from HUSBZB-1 to Skyconnect (multi-protocol not enabled). HUSBZB-1 was working fine for many many months until I added a lot of devices.
  2. Skyconnect is connected to a 5m USB extension cord, via a USB 2.0 Hub, which is connected to a USB 2.0 port.
  3. Re-paired many devices that did not migrate properly. Added an additional 4 routers in an attempt to beef up mesh stability.

image image

What version of Home Assistant Core has the issue?

core-2024.4.3

What was the last working version of Home Assistant Core?

Unsure

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

home-assistant_zha_2024-04-24T12-53-22.693Z.log

Debug during on/off failures of individual Hue lights, and on/off success when Group containing same lights on/off is sent.

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-04-24 08:53:16.261 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received messageSentHandler: [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 21065, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=5), 210, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.261 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 21065, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=5), 210, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.262 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140178771012160] Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrap_zigpy_exceptions
    yield
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 84, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/util.py", line 131, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 339, in request
    await send_request()
  File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 841, in request
    await self.send_packet(
  File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 931, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 239, in handle_call_service
    response = await hass.services.async_call(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2543, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2580, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 971, in entity_service_call
    single_response = await _handle_entity_call(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1043, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 642, in async_handle_light_off_service
    await light.async_turn_off(**filter_turn_off_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 472, in async_turn_off
    result = await self._on_off_cluster_handler.off()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
2024-04-24 08:53:16.469 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'75c9b1a96b2a15c1d9904b23aa5e99099c4e27a23ba867cdd37e'
2024-04-24 08:53:16.469 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8070787e'
2024-04-24 08:53:16.470 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received messageSentHandler: [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 32883, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=9), 214, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.471 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 32883, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=9), 214, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.472 DEBUG (MainThread) [zigpy.zcl] [0x8073:11:0x0008] Sending request header: ZCLHeader(frame_control=FrameControl<0x00>(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Client_to_Server: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True), tsn=252, command_id=<GeneralCommand.Read_Attributes: 0>, *direction=<Direction.Client_to_Server: 0>)
2024-04-24 08:53:16.472 DEBUG (MainThread) [zigpy.zcl] [0x8073:11:0x0008] Sending request: Read_Attributes(attribute_ids=[0])
2024-04-24 08:53:16.473 DEBUG (MainThread) [bellows.zigbee.application] Sending packet ZigbeePacket(timestamp=datetime.datetime(2024, 4, 24, 12, 53, 16, 472991, tzinfo=datetime.timezone.utc), src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), src_ep=11, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x8073), dst_ep=11, source_route=[0x6851, 0x9c03], extended_timeout=False, tsn=252, profile_id=260, cluster_id=8, data=Serialized[b'\x00\xfc\x00\x00\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=None, rssi=None)
2024-04-24 08:53:16.473 DEBUG (MainThread) [bellows.ezsp.protocol] Send command setSourceRoute: (0x8073, [0x6851, 0x9c03])
2024-04-24 08:53:16.474 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'50ce21a9fa2a66325bc5222636ca527e'
2024-04-24 08:53:16.482 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'06cea1a9fa2a1578667e'
2024-04-24 08:53:16.482 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8160597e'
2024-04-24 08:53:16.483 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received setSourceRoute: [<EmberStatus.SUCCESS: 0>]
2024-04-24 08:53:16.484 DEBUG (MainThread) [bellows.ezsp.protocol] Send command sendUnicast: (<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 0x8073, EmberApsFrame(profileId=260, clusterId=8, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=252), 238, b'\x00\xfc\x00\x00\x00')
2024-04-24 08:53:16.485 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'61cf21a9602a15c1d9904b2daa5e99099c4e275703cb6777fdc663d3617e'
2024-04-24 08:53:16.494 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'17cfa1a9602a159346777e'
2024-04-24 08:53:16.495 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'82503a7e'
2024-04-24 08:53:16.495 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received sendUnicast: [<EmberStatus.SUCCESS: 0>, 33]

Additional information

Diagnostic Logs config_entry-zha-65910acf658adfe741b482ea10beeb3f.json

home-assistant[bot] commented 2 months ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

TheJulianJES commented 2 months ago

What are the model names of the affected Hue lights?

nerdyninny commented 2 months ago

What are the model names of the affected Hue lights?

  1. Philips Hue (HA name: Hue Kitchen Counter Corner Lamp): IEEE: 00:17:88:01:00:1d:a7:eb, Nwk: 0x5249, LLC011 by Signify Netherlands B.V. | Firmware: 0x43005d0b

  2. Philips Hue (HA name: Hue Kitchen Counter Strip Light): IEEE: 00:17:88:01:00:cc:0c:13, Nwk: 0x4256, LST001 by Signify Netherlands B.V. | Firmware: 0x43005d0b

nerdyninny commented 2 months ago

Update: I just moved my coordinator to a central part of my home (1st floor, instead of basement). I have a lot of routers/repeaters (mains) already on the network (27 Hue Bulbs, 2 Hue Strips Lights, 1 Hue Bloom Lamp, 1 Tuya Air Monitoring Sensor, 17 ThirdReality Outlets). 13 ThirdReality Outlets were already present before I migrated the Hue devices, but I added another 4 afterwards to see if it would improve anything. Anyway, after moving my coordinator, I still have several Hue devices that intermittently work when controlled directly (EmberStatus.DELIVERY_FAILED: 102), but work very fast/consistently when I use the Zigbee group they happen to be part of. I've also noticed the same 'delivery failed' issue on some of my ThirdReality outlets as well, but I haven't bothered to add them to a Zigbee group.

dmulcahey commented 2 months ago

How many devices total are on the network? Source routing could possibly help if the network is a decent size.

nerdyninny commented 2 months ago

95 total. Source routing is already enabled.

enable zha quirks

zha: enable_quirks: true custom_quirks_path: /config/zha_quirks/ zigpy_config: source_routing: true ezsp_config: ����� CONFIG_MAX_END_DEVICE_CHILDREN: 0 ����� CONFIG_TX_POWER_MODE: 3 ����� CONFIG_NEIGHBOR_TABLE_SIZE: 16 ����� CONFIG_SOURCE_ROUTE_TABLE_SIZE: 110

—-

https://github.com/home-assistant/core/assets/59848054/27fe7fd6-dfee-4928-aeb5-7e7921eda67c

I figured a video of the issue might help.

dmulcahey commented 2 months ago

Ok, let’s try something. Put Zigpy at debug (can do with the logger set level service) then try to control a couple individual devices that give you trouble. Then get the logs and look at what is logged for the source routes for the bad transactions. Maybe there is a router on the network misbehaving. We can try power cycling and / or pulling the devices with the nwk in the route that is logged.

puddly commented 2 months ago

Also, get rid of CONFIG_TX_POWER_MODE.

nerdyninny commented 2 months ago

So I went ahead and just removed all the configuration.yaml Zha entries under ezsp_config, and disabled source routing for now.

Moving the coordinator made somewhat of a positive difference in terms of the total number of unresponsive Hue lights.

I haven’t done the Zigpy debug, mostly because I don’t know how and finger crossing things will stabilize.

I do have a question though:

Why are the lights unresponsive via the GUI as a Device, but consistently responsive when the same Device (or even Devices, plural) is added to a Zigbee Group? I get that a single command can be sent to a set of Zigbee grouped lights, which decreases network traffic, but even when I power on a single bulb (one command), it is also unresponsive? And even in the case of source routing enabled, wouldn’t the path be the same regardless?

dmulcahey commented 2 months ago

So I went ahead and just removed all the configuration.yaml Zha entries under ezsp_config, and disabled source routing for now.

Moving the coordinator made somewhat of a positive difference in terms of the total number of unresponsive Hue lights.

I haven’t done the Zigpy debug, mostly because I don’t know how and finger crossing things will stabilize.

I do have a question though:

Why are the lights unresponsive via the GUI as a Device, but consistently responsive when the same Device (or even Devices, plural) is added to a Zigbee Group? I get that a single command can be sent to a set of Zigbee grouped lights, which decreases network traffic, but even when I power on a single bulb (one command), it is also unresponsive? And even in the case of source routing enabled, wouldn’t the path be the same regardless?

No

  1. Source-Routed Packets:

    • Definition: In source routing, the entire path through which the packet is to travel through the network is determined at the source. This route is specified in the packet header, which means the packet carries the addresses of all the intermediate nodes it must pass through en route to the destination.
    • Purpose: Source routing is typically used in mesh networks, like those formed by Zigbee, to enhance routing efficiency and reliability. It helps in scenarios where network topology is stable, and the best routes are known and can be pre-determined, often based on previous interactions.
    • Advantages: Reduces the routing overhead on intermediate nodes, as they do not need to make routing decisions, just forward the packet based on the pre-determined path. It also can help avoid routing loops and decrease latency.
    • Disadvantages: Requires knowledge of the network topology, which may not always be current. It also increases the packet header size due to the inclusion of the list of node addresses.
  2. Broadcasts:

    • Definition: Broadcasting in Zigbee sends a message to all nodes within the network or within a certain radius. The packet does not specify any particular route or destination node addresses; instead, it is simply propagated by each node to all of its neighbors.
    • Purpose: Broadcasts are used for tasks like network-wide announcements, searching for a specific node (device discovery), or configuration commands that need to reach all nodes.
    • Advantages: Simple to implement as it does not require the source to know the network topology or the route to specific nodes. It ensures that all nodes in the area (or the entire network) will receive the message.
    • Disadvantages: Can lead to high network traffic and increased collisions, a phenomenon known as the "broadcast storm problem," especially in dense networks. It is less efficient in terms of network resource usage compared to directed routing methods.
  3. Group addressing: in Zigbee is a method used to efficiently manage communication among multiple devices within a Zigbee network. It allows a single message to be sent to multiple devices that are configured to listen to the same group address. This is particularly useful in home automation and IoT applications where multiple devices, like lights or sensors, need to receive the same command simultaneously. Here's how it works in detail:

Communication Using Group Addresses

So the TL;DR they are very different. Group messaging is technically multicast (a broadcast that only some devices will act on) essentially spray and pray… where as source routing is a message sent along a pre determined path to a particular device. This is why I wanted you to enable debug so we could see what devices were in the path to determine if maybe a particular device is swallowing messages or if your routes aren’t updating for some reason.

MirekDusinojc commented 2 months ago

I can confirm I might be observing the same issue. I have few Philips Hue lights. After the 2024.4 update I started noticing one of my Hue lights - a light strip - started misbehaving, becoming non-responsive time to time. I have it as a part of a scene and when the automation fires this light does not turn on sometimes, other times it fires without any issue. There hasn't been any issues before the update, it was working quite reliably. Other of my Hue lights seems to be working well though so I am not sure why only this one misbehave. When trying to fix the light using reconfiguration, it very often fails. I did a factory restart of it once. Also this should not be a signal issue as the device is few meters from the HUB and there are multiple other routers in the room

nerdyninny commented 1 month ago

My Zigbee network seems have mostly stabilized, except for one Hue Spotlight Color bulb.

I can control it consistently only via a Zigbee Group (it’s in a room group with 3 total). If I change the problem bulb’s color, or on/off, I get the following error:

Failed to call service light/turn_on. Failed to send request: Failed to deliver message: «EmberStatus.DELIVERY_FAILED: 102>

Kinda wonky and I don’t understand why a Zigbee group command works on it, but not a direct command by itself. It’s sporadic too as I’ve reset the bulb several times now. It’ll work for a day, or a few days, and then stop working (except when in a zigbee group).

Bulb stats: LCT002 by Signify Netherlands B.V. Firmware: 0x43006502

Device Type: Router LQI: 140 RSSI: -65 Last seen: 2024-05-28T21:58:25

TheJulianJES commented 1 month ago

The LLC011 is apparently TI/CC2530-based and is known to cause issues (especially with source routing) on anything but very old firmware. You might want to remove these devices from your network for now if you experience stability issues. The Hue firmware development team is informed of the issue.

Related thread:

nerdyninny commented 2 weeks ago

especially with source routing

Does going back to broadcast routing (instead of source routing) fix it?

I read the thread you hyperlinked to. I downloaded the oldest OTA firmware, but can’t seem to downgrade and looking at the debug logs it’s saying I have the latest firmware already. Any tips on how to downgrade using Zha?