home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.61k stars 30.37k forks source link

ZHA onboarding fails — stuck at “Starting Interview” #99497

Closed AndySymons closed 11 months ago

AndySymons commented 1 year ago

The problem

For a while now, it has not been possible to onboard any new Zigbee device on my system. They are recognised, but the onboarding process halts at “Starting Interview”.

One community forum commentator says the fault has been there since 2023.5 but I tested by reverting to 2023.4.4 and that did not work either.

See also the community threads:

It has certainly previously worked because I already have several devices operating, and those that are onboarded are still working OK.

I made a log with debug logging enabled while onboarding two example devices of types that I have previously onboarded successfully: an IKEA Tradfri Zigbee repeater and an unbranded Chinese motion sensor. I have the same problem with other devices that previously worked correctly including Aqara and other closure sensors, a SONOFF Zigbee smart socket and another type of unbranded motion sensor.

What version of Home Assistant Core has the issue?

2023.8.4

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Core

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

230902c Diagnostic info.txt

Example YAML snippet

Cannot be reproduced with YAML. Just run the usual manual onboarding process.

Anything in the logs that might be useful for us?

I have a log file but it is too big to upload here

Additional information

Please fix this bug, or advise how to circumvent it, with some urgency because my Home Automation project is stalled until I can use ZHA again.

home-assistant[bot] commented 1 year ago

Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

AndySymons commented 1 year ago

Any thoughts?
If you want the log file let me know how to get it to you.

puddly commented 1 year ago

You can usually compress the log file to a reasonable size by just zipping it up. If you don't want to share it in public, you can email it to me privately at puddly3@gmail.com.

AndySymons commented 1 year ago

230902d Log file.log.zip

(Why did I not think of that?)

AndySymons commented 1 year ago

So, did that help at all?

JelmerT commented 1 year ago

I'm having the exact same issue.

I'm trying to re-add some devices, they're discovered but then never go past "Starting Interview". I have a network with over 20 devices, and the one I'm trying to add definitely was added successfully in the past.

From the logs it seems like the device (FYRTUR roller blind with address 38:5b:44:ff:fe:1a:7f:28) is just joining over and over in a loop, but never completes. Not sure what should come next.

home-assistant_zha_2023-09-16T21-55-51.558Z.log

config_entry-zha-154bd5c7e412cad30efd15b9af331f5c.json.txt

cvocvo commented 1 year ago

I also noticed I have the same thing happening when I tried to remove and readd a device today for this issue: https://github.com/zigpy/zha-device-handlers/issues/2588

IMG_4781

jorhett commented 1 year ago

I'm seeing the same behavior when trying to re-add an RGBgenie to a Sonoff 700 usb stick on the branded blue HASS. This was never a problem before, I just haven't added any new Zigbee devices in the last 8 or so months.

The interview fails, and it shows no entities... but if you open the diagnostics it has every class that the device answers to listed there. If all of these problems are around the same issue, then it would suggest that the problem is in the processing of the response from the device.

MattWestb commented 1 year ago

If the device is being added in ZHA GUI deleting it and restart HA and then its stable adding it new is normally working OK.

AndySymons commented 1 year ago

I am sorry that there is no resolution to this important issue. It's a dealbreaker for me. I am going to have to abandon ZHA and possibly Home Assistant in favour of a Zigbee system that works reliably.

puddly commented 1 year ago

Looking at your log, MAC_CHANNEL_ACCESS_FAILURE appears over 1000 times. Channel access failures are the radio firmware refusing to send data because of interference in your environment.

If you cannot join devices but get the "starting interview" prompt, this is probably because your Zigbee radio is informed a device is joining but it cannot actually send commands to the device due to RF interference. Looking at your startup energy scan, you can see that your network channel (15) is fairly congested:

 11 : ###############################################################
 12 : ############
 13 : ############################################
 14 : ###############################################################
[15]: #############################################################
 16 : #################################################################
 17 : #######################
 18 : ########################################
 19 : ###########################################
 20 : ######################################
 21 : ###############################################
 22 : ############################################################
 23 : #########################################################
 24 : ########################################
 25 : ##########
 26 : ##############################

I suggest eliminating nearby RF noise sources from your environment (USB 3.0 ports, SSDs, 2.4GHz routers, etc.), changing your WiFi network's channel to move it away from your Zigbee network, and if all else fails, use the ZHA channel migration feature (leave it on auto) to pick a better channel.

IronZDev commented 1 year ago

In my case I do not get the MAC_ACK response.

JelmerT commented 1 year ago

In my case the issue doesn't seem to be congestion, not seeing any MAC_CHANNEL_ACCESS_FAILURE.

I've since switched over to zigbee2mqtt, switchover was annoying, but there's a lot more overview and feedback and things seem to be running good. Pairing was no issue on zigbee2mqtt with exact same hardware.

AndySymons commented 1 year ago

Channel access failures are the radio firmware refusing to send data because of interference in your environment.

That is very interesting but I am not sure how to respond. I have already ensured that the WiFi is on a different band from Zigbee. I do not know how to check for RF interference; I do not have a signal analyser. I tried several iPhone-based apps, but they do not seem to tell me anything I do not already know.

I have not moved the hub since the time when it was working, but I did introduce a Deco M5 mesh WiFi system. That is not on the same band as Zigbee but could it be causing interference anyway? I have also had persistent but unpredictable WiFi disconnection problems, which the mesh WiFi has so far not fully resolved. Maybe there is some Zigbee - Mesh Wifi interference issue here that is independent of band??

It seems I might need to move the hub with the Zigbee server, but it does not have its own WiFi so I would need to plug it into a WiFi mesh node anyway ...

Any suggestions how I can isolate and circumvent this problem?

AndySymons commented 1 year ago

I suggest eliminating nearby RF noise sources from your environment (USB 3.0 ports, SSDs, 2.4GHz routers, etc.), changing your WiFi network's channel to move it away from your Zigbee network, and if all else fails, use the ZHA channel migration feature (leave it on auto) to pick a better channel.

puddly commented 1 year ago

I do not know how to check for RF interference; I do not have a signal analyser.

If you download diagnostics for the ZHA integration, there will be an energy_scan section with percentage utilization for each channel.

I suppose I could move it away a couple of metres using a USB extension cord, but this was not a problem before.

You always want to use a USB extension cable to prevent this from becoming a problem. Sometimes the noise threshold is just below the current signal so even a small increase from new devices is enough to push it over the edge.

I already have the Zigbee hub on 'auto'.

Channel selection happens only when you form a new network or you explicitly ask the integration to move to a new channel. auto here isn't a setting, it's just telling ZHA to scan for a new channel after you submit the form, instead of picking an explicit channel from the list.

Move the coordinator as far away from noise sources as possible and see if that helps. You may even need two extension cables. After that's done and if things still don't work, you can migrate your network to a new channel as a last resort.

dtretyakov commented 1 year ago

It's definitely looks like a regression in the recent Home Assistant version. Pairing was working in HA version from the middle of July.

Hardware: raspberry and sonoff usb dongle, zigbee devices: any, e.g. Aqara H1 switches, Ikea bulbs.

cvocvo commented 1 year ago

I also noticed I have the same thing happening when I tried to remove and readd a device today for this issue: zigpy/zha-device-handlers#2588

In case anyone else has a SONOFF ZBDongle-E coordinator, I fixed my issue where it was perpetually stuck "Starting Interview" by upgrading the firmware from 6.10.3 to 7.3.1. I used this to update the firmware from the web easily; it only took me a few minutes to do. https://darkxst.github.io/silabs-firmware-builder/ This resolved my issue for several device types:

ahmadnassri commented 1 year ago

I'm seeing this as well, logs below:

[0x37CF](RWL020): Device seen - marking the device available and resetting counter
[0x37CF](RWL020): Update device availability -  device available: True - new availability: True - changed: False
New device 0x2c2d (00:17:88:01:04:ae:38:4a) joined the network
[0x2c2d] Scheduling initialization
Tries remaining: 5
[0x2c2d] Requesting 'Node Descriptor'
[0x2c2d] Extending timeout for 0x07 request
Tries remaining: 4
[0x2c2d] Requesting 'Node Descriptor'
[0x2c2d] Extending timeout for 0x09 request
Tries remaining: 3
[0x2c2d] Requesting 'Node Descriptor'
[0x2c2d] Extending timeout for 0x0b request
Tries remaining: 2
[0x2c2d] Requesting 'Node Descriptor'
[0x2c2d] Extending timeout for 0x0d request
Tries remaining: 1
[0x2c2d] Requesting 'Node Descriptor'
[0x2c2d] Extending timeout for 0x0f request
[0x37CF](RWL020): Device seen - marking the device available and resetting counter
[0x37CF](RWL020): Update device availability -  device available: True - new availability: True - changed: False
Device 0xa715 (00:17:88:01:02:c1:2c:df) joined the network
Device 00:17:88:01:02:c1:2c:df changed id (0x37cf => 0xa715)
[0xa715] Skipping initialization, device is fully initialized
Device is initialized <PhilipsRWLFirstGen model='RWL020' manuf='Philips' nwk=0xA715 ieee=00:17:88:01:02:c1:2c:df is_initialized=True>
device - 0xa715:00:17:88:01:02:c1:2c:df entering async_device_initialized - is_new_join: False
device - 0xa715:00:17:88:01:02:c1:2c:df has been reset and re-added or its nwk address changed
skipping discovery for previously discovered device - 0xa715:00:17:88:01:02:c1:2c:df
[0xa715](RWL020): started configuration
[0xa715:ZDO](RWL020): 'async_configure' stage succeeded
[0xa715:1:0x0000]: Configuring cluster attribute reporting
[0xa715:1:0x0000]: finished cluster handler configuration
[0xa715:1:0x0008]: Performing cluster binding
[0xa715] Extending timeout for 0x11 request
[0xa715:1:0x0006]: Performing cluster binding
[0xa715] Extending timeout for 0x13 request
[0xa715:1:0x0005]: Performing cluster binding
[0xa715] Extending timeout for 0x15 request
[0xa715:2:0x0003]: Configuring cluster attribute reporting
[0xa715:2:0x0003]: finished cluster handler configuration
[0xa715:2:0x0001]: Performing cluster binding
[0xa715] Extending timeout for 0x17 request
[0xa715:2:0x0000]: Configuring cluster attribute reporting
[0xa715:2:0x0000]: finished cluster handler configuration
[0xa715:2:0x000f]: Performing cluster binding
[0xa715] Extending timeout for 0x19 request
[0xa715:2:0xfc00]: Performing cluster binding
[0xa715] Extending timeout for 0x1b request
[0xa715:2:0x0019]: finished cluster handler configuration
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=0, dst=AddrModeAddress(addr_mode=<AddrMode.Broadcast: 15>, address=<BroadcastAddress.ALL_ROUTERS_AND_COORDINATOR: 65532>), dst_ep=0, source_route=None, extended_timeout=False, tsn=162, profile_id=0, cluster_id=19, data=Serialized[b'\x00\x15\xa7\xdf,\xc1\x02\x01\x88\x17\x00\x80'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=208, rssi=-48)
Device 0xa715 (00:17:88:01:02:c1:2c:df) joined the network
[0xa715:zdo] ZDO request ZDOCmd.Device_annce: [0xA715, 00:17:88:01:02:c1:2c:df, 128]
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=0, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=0, source_route=None, extended_timeout=False, tsn=163, profile_id=0, cluster_id=32801, data=Serialized[b'\x13\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=196, rssi=-51)
[0xa715:1:0x0006]: bound 'on_off' cluster: Status.SUCCESS
[0xa715:1:0x0006]: finished cluster handler configuration
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=0, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=0, source_route=None, extended_timeout=False, tsn=164, profile_id=0, cluster_id=32801, data=Serialized[b'\x11\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=200, rssi=-50)
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=0, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=0, source_route=None, extended_timeout=False, tsn=165, profile_id=0, cluster_id=32801, data=Serialized[b'\x15\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=200, rssi=-50)
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=0, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=0, source_route=None, extended_timeout=False, tsn=167, profile_id=0, cluster_id=32801, data=Serialized[b'\x19\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=200, rssi=-50)
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=0, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=0, source_route=None, extended_timeout=False, tsn=168, profile_id=0, cluster_id=32801, data=Serialized[b'\x1b\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=200, rssi=-50)
[0xa715:2:0x000f]: bound 'binary_input' cluster: Status.SUCCESS
[0xa715:2:0x000f]: Configuring cluster attribute reporting
[0xA715:2:0x000f] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True), tsn=29, command_id=<GeneralCommand.Configure_Reporting: 6>, *direction=<Direction.Server_to_Client: 0>)
[0xA715:2:0x000f] Sending request: Configure_Reporting(config_records=[AttributeReportingConfig(direction=0, attrid=0x0055, datatype=16, min_interval=30, max_interval=900, reportable_change=1)])
[0xa715] Extending timeout for 0x1d request
[0xa715:2:0xfc00]: bound 'philips_remote_cluster' cluster: Status.SUCCESS
[0xa715:2:0xfc00]: Configuring cluster attribute reporting
[0xa715:2:0xfc00]: finished cluster handler configuration
[0xa715:1:0x0008]: bound 'level' cluster: Status.SUCCESS
[0xa715:1:0x0008]: finished cluster handler configuration
[0xa715:1:0x0005]: bound 'scenes' cluster: Status.SUCCESS
[0xa715:1:0x0005]: finished cluster handler configuration
[0xa715:1:0x0000]: 'async_configure' stage succeeded
[0xa715:1:0x0008]: 'async_configure' stage succeeded
[0xa715:1:0x0006]: 'async_configure' stage succeeded
[0xa715:1:0x0005]: 'async_configure' stage succeeded
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=2, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=2, source_route=None, extended_timeout=False, tsn=169, profile_id=260, cluster_id=15, data=Serialized[b'\x18\x1d\x07\x86\x00U\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=216, rssi=-46)
[0xA715:2:0x000f] Received ZCL frame: b'\x18\x1d\x07\x86\x00U\x00'
[0xA715:2:0x000f] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True), tsn=29, command_id=7, *direction=<Direction.Client_to_Server: 1>)
[0xA715:2:0x000f] Decoded ZCL frame: BinaryInput:Configure_Reporting_rsp(status_records=[ConfigureReportingResponseRecord(status=<Status.UNSUPPORTED_ATTRIBUTE: 134>, direction=<ReportingDirection.SendReports: 0>, attrid=85)])
[0xa715:2:0x000f]: failed to set reporting on 'binary_input' cluster for: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
[0xa715:2:0x000f]: finished cluster handler configuration
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=2, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=1, source_route=None, extended_timeout=False, tsn=170, profile_id=260, cluster_id=64512, data=Serialized[b'\x1d\x0b\x10\x00\x00\x04\x00\x000\x00!\x00\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=208, rssi=-48)
[0xA715:2:0xfc00] Received ZCL frame: b'\x1d\x0b\x10\x00\x00\x04\x00\x000\x00!\x00\x00'
[0xA715:2:0xfc00] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=True, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=True, *is_general=False), manufacturer=4107, tsn=0, command_id=0, *direction=<Direction.Client_to_Server: 1>)
[0xA715:2:0xfc00] Decoded ZCL frame: PhilipsRemoteCluster:notification(button=4, param2=3145728, press_type=0, param4=33, param5=0, param6=0)
[0xA715:2:0xfc00] Received command 0x00 (TSN 0): notification(button=4, param2=3145728, press_type=0, param4=33, param5=0, param6=0)
Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xA715), src_ep=1, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=1, source_route=None, extended_timeout=False, tsn=171, profile_id=260, cluster_id=6, data=Serialized[b'\x01\x01@\x00\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=208, rssi=-48)
[0xA715:1:0x0006] Received ZCL frame: b'\x01\x01@\x00\x00'
[0xA715:1:0x0006] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=0, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False), tsn=1, command_id=64, *direction=<Direction.Server_to_Client: 0>)
[0xA715:1:0x0006] Decoded ZCL frame: OnOff:off_with_effect(effect_id=<OffEffectIdentifier.Delayed_All_Off: 0>, effect_variant=0)
[0xA715:1:0x0006] Received command 0x40 (TSN 1): off_with_effect(effect_id=<OffEffectIdentifier.Delayed_All_Off: 0>, effect_variant=0)
[0xA715:1:0x0006] No explicit handler for cluster command 0x40: off_with_effect(effect_id=<OffEffectIdentifier.Delayed_All_Off: 0>, effect_variant=0)
Device 0xa715 (00:17:88:01:02:c1:2c:df) left the network
[0xa715](RWL020): Update device availability -  device available: True - new availability: False - changed: True
[0xa715](RWL020): Device availability changed and device became unavailable
Device 0xa715 (00:17:88:01:02:c1:2c:df) left the network
[0xa715](RWL020): Update device availability -  device available: False - new availability: False - changed: False
[0xa715:2:0x0001]: Failed to bind 'power' cluster: 
[0xa715:2:0x0001]: Configuring cluster attribute reporting
[0xA715:2:0x0001] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True), tsn=31, command_id=<GeneralCommand.Configure_Reporting: 6>, *direction=<Direction.Server_to_Client: 0>)
[0xA715:2:0x0001] Sending request: Configure_Reporting(config_records=[AttributeReportingConfig(direction=0, attrid=0x0020, datatype=32, min_interval=3600, max_interval=10800, reportable_change=1), AttributeReportingConfig(direction=0, attrid=0x0021, datatype=32, min_interval=3600, max_interval=10800, reportable_change=1)])
[0xa715] Extending timeout for 0x1f request

I also see errors in the logs:

Logger: homeassistant
Source: components/zha/core/cluster_handlers/__init__.py:75
First occurred: 2:55:15 PM (4 occurrences)
Last logged: 4:32:11 PM

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrap_zigpy_exceptions
    yield
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 84, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 132, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 293, in request
    await self._application.request(
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 828, in request
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 870, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/device.py", line 578, in async_configure
    await self.identify_ch.trigger_effect(
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

after many attempts I was able to add the devices ... but they became unavailable IMMEDIATELY

Geo-Ron commented 1 year ago

I too am having this issue, but unsure what the cause is of this.

Last time I needed to re-add a device was about a month ago.

Will check tonight when I get home.

ahmadnassri commented 1 year ago

I tried switching channels to see if it was a signal noise problem, but that didn't help.

I switched to Zigbee2MQTT and everything worked, so it's not a hardware / signal issue.

Geo-Ron commented 1 year ago

I too am having this issue, but unsure what the cause is of this.

  • I have restored a backup of my instance today (sd-card broke)
  • I needed to re-pair 1 ikea light bulb and it never came past the interview message.
  • I do have a Sonoff ZB dongle, but am unsure if I have the -E variant

Last time I needed to re-add a device was about a month ago.

Will check tonight when I get home.

Yesterday I've put some effort in this issue. I have got a ITead Sonoff Zigbee 3.0 USB Dongle Plus.

I have updated the firmware of this device with the aid of the script at https://community.home-assistant.io/t/sonoffs-zigbee-3-0-usb-dongle-plus-firmware/420558/5

The used firmware is the one at https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_3.x.0/bin

Now my issue is gone and I am happy!

MattWestb commented 1 year ago

Ti Zigbee 3 firmware have never working with real Zigbee 3 end device as children then it was disabled by some creative persons for getting Xiaomi / Aqara devices working (that cant doing Zigbee 3 things) and it enabled in the 2023.X release after some years fighting with the Z2M devs. They still braking Zigbee groups and router discovery in the firmware.

AndySymons commented 11 months ago

In my case, I tracked the issue to RF interference from a TV. I invested in a Tiny SA Ultra signal analyser (£150 in the UK), which I can heartily recommend. Looking at the 2.4 - 2.5 GHz range one can clearly see the interference across the band (see figs). I have no idea why the TV is doing that; and turning its own WiFi off makes no difference. The remedy was to move the WiFi mesh hub and my home automation computer with its Zigbee and Bluetooth radios well away from the TV.

I only wish the developers would program ZHA to issue human-readable error messages to the log -- something like "cannot complete pairing due to RF interference" to save me having to post here for a translation.

IMG_4730 IMG_4731

AndySymons commented 11 months ago

Thanks everyone, especially @puddly, for your help

MattWestb commented 11 months ago

@AndySymons The TVs WiFi is not real active (or more correct shall not being) but it its some screen mirroring like miracast that is "scanning" for clients but normally it shall only doing beacons and not sending all the time so i think its one bug in the TVs firmware that is doing it. My old Samsung TV is connected with Ethernet and the media-hub in it is having over 50 active connections all the time and its more then i working on the laptop so its also crazy Very interesting findings !!!!

AndySymons commented 11 months ago

This was an LG TV and is indeed not a result I expected to find. I can only issue general warnings to all Home Assistant users

  1. If onboarding new Zigbee devices sticks at "starting interview", or you have intermittent Zigbee disconnections, the cause could well be radio interference
  2. It may be your TV that is interfering, so keep your Zigbee and Bluetooth radios well away from the TV!