home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.19k stars 31.16k forks source link

ZHA - Logs flooded with 'EmberStatus.DELIVERY_FAILED: 102' since 2023.7 #97662

Closed nicknol closed 1 month ago

nicknol commented 1 year ago

The problem

since core 2023.7 my logs are filled with messages about DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'

I'm using SkyConnect and Zigbee devices of several manufactures.

Unfortunately, I can't figure out in which circumstances the messages are written to the log.

What version of Home Assistant Core has the issue?

core-2023.8.0

What was the last working version of Home Assistant Core?

2023.6

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

config_entry-zha-9d22930197d58cdaf44504af8eaf139e.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:508
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 06:11:36 (13 occurrences)
Last logged: 06:54:42

[0xD5CB:2:0x0402]: async_initialize: all attempts have failed: [TimeoutError(), TimeoutError(), TimeoutError(), TimeoutError()]
[0xD5CB:1:0x0408]: async_initialize: all attempts have failed: [TimeoutError(), TimeoutError(), TimeoutError(), TimeoutError()]
[0xB1B5:1:0x0008]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0xB1B5:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0xB1B5:1:0x0300]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]

Additional information

No response

home-assistant[bot] commented 1 year ago

Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

phil-lipp commented 1 year ago

Same here. Dont know it it's related to the errors, but I constantly loose connection to some of my sockets (TS011F by _TZ3000_cehuw1lw)

Logger: homeassistant
Source: components/zha/core/cluster_handlers/__init__.py:75
First occurred: 9:57:14 AM (1 occurrences)
Last logged: 9:57:14 AM
Error doing job: Task exception was never retrieved

Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.11/site-packages/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/homeassistant/lib/python3.11/site-packages/zigpy/util.py", line 132, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/srv/homeassistant/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 375, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/homeassistant/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/homeassistant/lib/python3.11/site-packages/zigpy/device.py", line 293, in request
    await self._application.request(
  File "/srv/homeassistant/lib/python3.11/site-packages/zigpy/application.py", line 824, in request
    await self.send_packet(
  File "/srv/homeassistant/lib/python3.11/site-packages/bellows/zigbee/application.py", line 831, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.11/site-packages/homeassistant/components/zha/core/device.py", line 574, in async_configure
    await self.identify_ch.trigger_effect(
  File "/srv/homeassistant/lib/python3.11/site-packages/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrapper
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:508
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 5:01:39 AM (6 occurrences)
Last logged: 5:01:41 AM

[0x0150:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0xF79E:1:0x0702]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x0150:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x0150:1:0x0702]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0xF79E:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
puddly commented 1 year ago

Does downgrading to 2023.6.x cause these errors to go away? If so, please upload ZHA debug logs from both versions.

They're logged when a device (0xD5CB and 0xB1B5) is unreachable. This isn't really something ZHA controls beyond notifying you of device availability.

nicknol commented 1 year ago

I didn't try to downgrade to 2023.6 yet. Would be helpful to get an easy instruction for this.

As said, and confirmed by others, it has started with 2023.7. The upgrade of HA introduced the issue, the hardware (neither my Pi4 nor my Skyconnect nor my Zigbee devices) have been changed.

The issue affects automations as well, since after 3 tries the command is no longer sent to the device. I've attached related log entries.

It seems to affect most often Zigbee Groups. EmberStatus_NETWORK_BUSY_161.txt

puddly commented 1 year ago

NETWORK_BUSY is very different from DELIVERY_FAILED. NETWORK_BUSY is logged because you're sending too many group requests at once, and they start being limited by the coordinator firmware: each request acts as a network-wide broadcast and will at some point start congesting the network to the point of inoperability. The firmware stops this from happening, this behavior isn't controlled by ZHA.

The upgrade of HA introduced the issue

RF environments can change even if nothing else appears to have. WiFi networks automatically change channels, your neighbors get new WiFi mesh systems, Zigbee devices decide to make poor connections, etc. The only real change between 2023.6.0 and 2023.7.0 is increasing the number of command retries, which would not cause this problem unless your devices were already unreachable.

To that end, your debug information contains the following:

    "energy_scan": {
      "11": 3.2311094587038967,
      "12": 2.84844209578687,
      "13": 15.32285793082191,
      "14": 4.69985354430736,
      "15": 80.38447947821754,  # Your Zigbee channel
      "16": 59.15797905332195,
      "17": 80.38447947821754,
      "18": 80.38447947821754,
      "19": 43.057636198227904,
      "20": 12.244260188723507,
      "21": 92.0598007161209,
      "22": 94.48255331375627,
      "23": 92.0598007161209,
      "24": 78.25348754651363,
      "25": 65.26028270288712,
      "26": 70.89933442360993
    },

Your current channel (15) is very congested and you should be receiving the following warning every time you start up Home Assistant:

Zigbee channel 15 utilization is 80.38%! If you are having problems joining new devices, are missing sensor updates, or have issues keeping devices joined, ensure your coordinator is away from interference sources such as USB 3.0 devices, SSDs, WiFi routers, etc.

I would first try following the above advice. If all else fails, change your network's channel from the ZHA configuration page (pick auto to pick the best one automatically):

image
nicknol commented 1 year ago

@puddly thanks a lot for the insights and advice.

I put a 2m cable between my Pi4 and the SkyConnect, and the SkyConnect is pretty close (2m) to the group of lights. Overall I have only 3 Zigbee-Groups of lights, hence I don't think the commands are filling the network.

If I change the Zigbee channel ... do I need to re-pair all devices?

puddly commented 1 year ago

Overall I have only 3 Zigbee-Groups of lights, hence I don't think the commands are filling the network.

The specific number of groups doesn't matter, it's just the pattern of sending commands is triggering the BUSY firmware error. You could run into this with a single group by sending about one lighting command per second consistently. Controlling three groups at once would potentially send out six or more broadcasts, very close to the limit. Do that twice in a few seconds and you hit it.

If I change the Zigbee channel ... do I need to re-pair all devices?

No, many people have migrated channels without having to rejoin a single device afterwards. Most routing devices (like bulbs) should migrate instantly, sensors may take 15 minutes.

nicknol commented 1 year ago

Overall I have only 3 Zigbee-Groups of lights, hence I don't think the commands are filling the network.

The specific number of groups doesn't matter, it's just the pattern of sending commands is triggering the BUSY firmware error. You could run into this with a single group by sending about one lighting command per second consistently. Controlling three groups at once would potentially send out six or more broadcasts, very close to the limit. Do that twice in a few seconds and you hit it.

Well, I'm using the 'adaptive lightning' integration for the three Zigbee groups. the AL integration is adjusting the light temperature and brightness regularly. Do you see a potential relation?

If I change the Zigbee channel ... do I need to re-pair all devices?

No, many people have migrated channels without having to rejoin a single device afterwards. Most routing devices (like bulbs) should migrate instantly, sensors may take 15 minutes.

ok, I will give it a try :)

lux4rd0 commented 1 year ago

I am experiencing the same issues and have attached this message to provide more information. I have been working with version 2023.06.* and have noticed that it is broken in versions 2023.07 and 2023.08. I have been sorting through various messages to keep track of what others are seeing for potential solutions. The issue occurs right away for me, so I am willing to send logs if necessary. I just need guidance on how to send specific zigbee/z-wave logs.

rwarner commented 1 year ago

Chiming in here as well, had an issue executing one of my Automations this morning that occur everyday. Did not continue turning off the lights after turning them on. Haven't had this happen prior to 2023.8. I did have other issues with zha on 2023.7 but it was not this.

- alias: MORNING - Reset all lights to default brightness
  trigger:
      - platform: time
        at: '08:00:00'
  action:
    - service: light.turn_on
      target:
        entity_id: 
          - light.family_room_lights
          - light.living_room_fireplace
          - light.kitchen_lights
          - light.kitchen_ceiling_lights
      data:
        brightness_pct: 100
    - service: light.turn_off
      target:
        entity_id: 
          - light.family_room_lights
          - light.living_room_fireplace
          - light.kitchen_lights
          - light.kitchen_ceiling_lights
Logger: homeassistant.components.automation.morning_reset_all_lights_to_default_brightness
Source: helpers/script.py:420
Integration: Automation (documentation, issues)
First occurred: 8:00:05 AM (1 occurrences)
Last logged: 8:00:05 AM

MORNING - Reset all lights to default brightness: Error executing script. Error for call_service at pos 1: Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
Logger: homeassistant.components.automation.morning_reset_all_lights_to_default_brightness
Source: components/automation/__init__.py:680
Integration: Automation (documentation, issues)
First occurred: 8:00:05 AM (1 occurrences)
Last logged: 8:00:05 AM

Error while executing automation automation.morning_reset_all_lights_to_default_brightness: Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

Not sure if related: (Delivery failed present) - https://github.com/home-assistant/core/issues/99305 (Delivery failed present) - https://github.com/home-assistant/core/issues/90424 (Questionable, but also zha errors) - https://github.com/home-assistant/core/issues/98735

Will try updating my zwave docker instance

puddly commented 1 year ago

@rwarner Try enabling continue_on_error: https://www.home-assistant.io/docs/scripts/#continuing-on-error

ZHA was changed in 2023.8 to throw a few more types of errors when commands failed (after a few attempts). Previously, it silently allowed failures.

rwarner commented 1 year ago

Thanks for the suggestion, I would hate to have to add this to every automation I turn lights on/off with. Should I see if it pops up again before trying?

Or add it to avoid automations not working and see if the logs continue to show the errors?

I just noticed continue_on_error would be for each action and not each automation so I would hate to have to add that to every single action in my repo let alone automation.

orbelico commented 1 year ago

ZHA was changed in 2023.8 to throw a few more types of errors when commands failed (after a few attempts). Previously, it silently allowed failures.

@puddly if that's the case, it also explains https://github.com/home-assistant/core/issues/98735

Why I understand having errors if things go wrong, it can create a lot of log spam in cases that are not so uncommon, like a Zigbee device being not reachable due to it being powered off, and an automation trying to access it. Since Zigbee disconnect is not detectable without delay, in such a case errors cannot be avoided and I would prefer the old behaviour of ZHA being tolerant to such cases.

Also, I do not see the point in printing 10s of lines containing the full trace to the log. Other HA errors print a single line, which has a much better signal-to-noise ratio.

puddly commented 1 year ago

Also, I do not see the point in printing 10s of lines containing the full trace to the log. Other HA errors print a single line, which has a much better signal-to-noise ratio.

I'm just bringing ZHA's behavior in line with other integrations. This is the way Hue and Z-Wave JS do it, and is the way other integrations are expected to behave. Home Assistant decides to print the full traceback so if you think the logging is too verbose, this would be something that can be fixed with HA Core globally and probably can be suggested as a feature request in the community forums.

orbelico commented 1 year ago

I am not sure I am understanding this correctly. Here are some examples that throw errors, but do not print traces:

  1. Xiaomi BLE connection error 2023-09-05 03:40:32.724 ERROR (MainThread) [homeassistant.components.xiaomi_ble] 5C:85:7E:B0:1F:E7: Bluetooth error whilst polling: 5C:85:7E:B0:1F:E7 - 5C:85:7E:B0:1F:E7: Failed to connect after 4 attempt(s): TimeoutError Happens quite frequently, but it is a single line, so very tolerable. This is the way I would expect ZHA to behave, too, in case a command can't be sent.

  2. Pyscript (integration from HACS to implement automations in Python) reports an error in my own script: 2023-09-05 05:31:47.062 ERROR (MainThread) [custom_components.pyscript.modules.smartnotify._notify] Exception in <modules.smartnotify._notify> line 270: log.debug(debug_msg.format(**kwargs)) ^ KeyError: 'name' That's three lines, but still rather short (and helpful enough, without any trace).

Are you saying, that HA core decided for these errors (one of them coming from an unofficial, random integration from HACS) if to print a trace or not? And that you have no control from ZHA to provide a cleaner log entry for a simple connection error? I find that hard to believe.

As I said before, basically, such connection errors are to be expected. ZHA provides an API to develop automations. This API provides no means at all to detect reliably, if a device is available, or not. Still, it expects me to only command devices which are actually available, or my log file is flooded with repetitive, very long messages. I do not find that a very good way for an API to behave.

Please don't take this the wrong way, I am a big fan of HA&ZHA and very much appreciate all the hard work you're putting into it. I'm a developer myself and I know how hard it can be to take the right design decisions in such cases and how fringe cases can mess up initial assumptions. But I think it is fair to have a discussion what kind of implementation does provide a good developer experience, and the most recent one, at least for me (and other people in this discussion), does not.

rwarner commented 1 year ago

@puddly - I added continue_on_error for every light.turn_on | light.turn_off that I have in my automations. Is this advisable if most of my lights are Zigbee? Just seems weird that I'll have to do this now for automations, no?

Wanted to follow up on this, if these errors are going to cancel automations mid-execution. Or if perhaps something I can request for HA-Core to always continue on error?

jclendineng commented 1 year ago

Just want to add that I have devices drop off all the time. Bulbs exclusively. If I have 2 or 3 on a lamp, 1 will drop off every couple days. Auto channel didn’t work, it managed to find my most congested channel (25) so I guess I don’t know what else to try. Maybe I need to move the yellow further away from my rack, I’m at a loss. Moved from hubitat and that didn’t have any issues with dropped devices so thinking it’s something in the way they handle zha commands.

den-mac commented 1 year ago

I'm still getting a lot of these spammy messages as well as of 2023.11.0. I don't know if a debug log will help but let me know.

2023-11-04 17:40:56.416 WARNING (MainThread) [zigpy.topology] Topology scan failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 78, in _scan_loop
    await self.scan()
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 96, in scan
    await self._scan_task
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 221, in _scan
    await self._find_unknown_devices(neighbors=self.neighbors, routes=self.routes)
  File "/usr/local/lib/python3.11/site-packages/zigpy/topology.py", line 253, in _find_unknown_devices
    await self._app._discover_unknown_device(nwk)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 880, in _discover_unknown_device
    return await zigpy.zdo.broadcast(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 609, in broadcast
    return await app.broadcast(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 856, in broadcast
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 853, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to enqueue message after 3 attempts: <EmberStatus.NETWORK_BUSY: 161>
2023-11-04 21:17:10.017 ERROR (MainThread) [zigpy.zcl] [0xC667:1:0x0020] AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 296, in request
    with self._pending.new(sequence) as req:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 297, in new
    raise ControllerException(f"duplicate {sequence} TSN") from AssertionError
zigpy.exceptions.ControllerException: duplicate 34 TSN

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 519, in check_in_response
    await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: duplicate 34 TSN
codyc1515 commented 1 year ago

I changed Zigbee channel and don’t face this issue anymore.

den-mac commented 1 year ago

Ugh, I wish I could go back in time. I tried to change the zigbee channel and it didn't seem to work. Then some time later in the day it did change after taking another peek. But most of my devices went offline. So I changed it back to 15 and those devices are still offline, even after waiting a day.

So I'm having to go back and manually re-add them back in (just getting to some of them is a lot of work) and that's slow going.

I'm still getting those errors though in the logs, and it's quite spammy.

I have plans to move to a new Sonoff USB zigbee dongle with an antenna, moving away from the Nortek stick, so I'm hoping that helps clear things up. Crossing my fingers!

quettih commented 1 year ago

I've exactly your problem starting from 2023.11 + OS 11.1 release while until the day before (2023.10.5 and os 10.5), no problems! now I'm going to burn an image on another M2 device with 11.0 or 10.5 because I'm pretty sure that nothing changes in my network and my neighborhood's router ! I used a network scanner to monitor the WiFi and my Skiconnect is 1 meter away from router by using an usb cable.

so, if the not the OS, what else? :-D

rwarner commented 12 months ago

Just chiming in here, still getting:

Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

On 2023.11.3, all light automations also include continue_on_error where applicable. It seems to be happening less, but still occurring. I did swap out two porch lights that were finicky and far with a zwave light switch which may have reduce this in my experience.

Dlanor80 commented 12 months ago

I have limited the amount of end devices to zero. The amount of ‘EmberStatus.DELIVERY_FAILED: 102’ has dropped. Unfortunately completely.

sompjang commented 11 months ago

Just chiming in here, still getting:

Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

On 2023.11.3, all light automations also include continue_on_error where applicable. It seems to be happening less, but still occurring. I did swap out two porch lights that were finicky and far with a zwave light switch which may have reduce this in my experience.

I also did not notice any problems until 2023.11.3. After that, the battery switches disconnect regularly, and once in 2 days I need to restart Hassio so Zha works correctly. I have tried changing the channel, adding a longer extension cable, for my POPP ZB-STICK, and adding additional Ikea outlets, removing adaptive lightning reading it, and changing params. I have noticed, that the network is more stable when all devices are on. I am still trying different "solutions" but unfortunately nothing helps.

TomHejret commented 11 months ago

I am having the same issue in my Zigbee network. After reading comments here and in related issues, I have found out, that there is just a single problematic device having this issue. See my fairly simple topology: image The other end device, connected to the problematic one (router type) also became unavailable recently, so it might be related as well.

I spot this double connected endDevice for the first time (didn't know to look for it before), so I will try to check, if the issue goes away once double connection disappears.

I see appearing the DELIVERY_FAILED issue on and off. I didn't use ZHA before mentioned version 2023.6, so can't tell if HA version matters. I am running now the latest version in Docker container. Some info from ZHA debug logs:

    "energy_scan": {
      .
      .
      "18": 8.631361812931262,
      "19": 2.84844209578687,
      "20": 3.2311094587038967,  # my channel
      "21": 3.2311094587038967,
      "22": 3.6632469452765037,
      .
      .
    }

EDIT:

rwarner commented 11 months ago

This is becoming more of a problem recently. This past weekend a lot of Zigbee devices started being unreliable and this message kept appearing when trying to switch some lights on/off in the Home Assistant dashboard which I haven't seen before.

Still on 2023.11.3 at the moment

MattWestb commented 11 months ago

I am having the same issue in my Zigbee network. After reading comments here and in related issues, I have found out, that there is just a single problematic device having this issue. See my fairly simple topology: image The other end device, connected to the problematic one (router type) also became unavailable recently, so it might be related as well.

Your posted image is private so cant see anything. What is the problematic router brand / model ) Its some bad behaving ones out there and good to knowing.

jclendineng commented 11 months ago

For someone stumbling on this, this is a bad end device almost 100% of the time. I'm considering a workaround which is setting up a second zigbee network for battery devices and having 1 exclusively for bulbs. Bulbs and outlets make great repeaters since they are hardwired, and you won't see any drops with those. If you do have badly behaving battery devices I'd look into setting them up on a second network somehow OR replacing them with better devices. My issues in particular with bulbs dropping is coming from sonoff leak sensors and climate sensors. I pitched the climate sensors since I didn't need them anymore and that cleared up most of the drops. The leak sensors are still problematic but the issue is vastly improved. The only sure fire way to fix is to ditch all no name battery devices that drop routes or have 2 networks for hardwired and battery devices.

Edit. I have a hubitat hub and there's a HACS integration for it so I'm going to try that, or look into having 2 separate networks in homeassistant.

MattWestb commented 11 months ago

Foe all Zigbee network also for hubitat is having many end device and very few or no routers is making the network not working good but you can trying. One main powered router device is using maximal TX power and the same chip used on one end device is only using 1/10 of the power so you is going getting very much range problems and also asymmetrical links that is not working well and is more sensitive to radio interference.

rwarner commented 11 months ago

Upgraded to 2023.12.4 and thought more about @TomHejret 's topology insight. I ended up removing a no longer needed Ikea Tradfri repeater/router that I originally was hoping would extend access to some previously connected zigbee bulbs that are now on a zwave switch. I haven't been seeing this message quite as much. But, it does still pop up in the log from one nightly automation.

issue-triage-workflows[bot] commented 8 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

sfnemis commented 8 months ago

The issue still continue. I have the same problems :(

evelant commented 7 months ago

I'm having similar problems in my zigbee network (skyconnect on latest firmware, zigbee only no thread). I get a lot of delivery failed and network busy messages despite having repeaters all over the place. There's pretty much no place that isn't within 10ft of a repeater but the errors still happen frequently.

nerdyninny commented 7 months ago

Same here. But found a workaround that is really annoying.

https://github.com/home-assistant/core/issues/116104

rwarner commented 7 months ago

Fwiw, I am on 2024.4.3 now. I haven't been seeing this as much as I was late last year. Will continue to keep an eye out since there are others seeing this still.

issue-triage-workflows[bot] commented 4 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

evelant commented 4 months ago

This is still an issue. Some progress has been made at https://github.com/home-assistant/core/issues/86411 with a new firmware build but it introduced some new issues

issue-triage-workflows[bot] commented 1 month ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

evelant commented 1 month ago

I'm not sure this is stale

puddly commented 1 month ago

There's been nearly a year since the last comment, with many many changes to ZHA and coordinator firmwares happening in the meantime.

I'm going to close this issue. Please open a separate issue and attach debug logs and diagnostics information for the ZHA integration if you're still having issues.