Open stegitto opened 2 months ago
Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
zha documentation zha source (message by IssueLinks)
Please attach both pieces of diagnostics information, I showed in the screenshot where you can download it: https://github.com/home-assistant/core/issues/124516#issuecomment-2364398602
home-assistant_zha_2024-09-20T21-47-32.100Z.log config_entry-zha-aa2696fe0a36266b98b7767201af0f90.json
New debug and new disgnostics attached. Thank you.
Same issue being experienced across multiple devices. Nothing has been added to or changed in the ZHA config. Same error with an error log of: Logger: homeassistant Source: components/zha/helpers.py:1291 First occurred: 07:14:45 (1 occurrences) Last logged: 07:14:45
Error doing job: Task exception was never retrieved (None) Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 67, in wrap_zigpy_exceptions yield File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 85, in wrapper return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/util.py", line 136, in retry return await func() ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/quirks/init.py", line 254, in command return await self.request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/init.py", line 378, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 265, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 339, in request await send_request() File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 834, in request await self.send_packet( File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 827, in send_packet raise zigpy.exceptions.DeliveryError( zigpy.exceptions.DeliveryError: Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/helpers.py", line 1289, in handler return await func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 181, in async_turn_on await self.entity_data.entity.async_turn_on( File "/usr/local/lib/python3.12/site-packages/zha/application/platforms/light/init.py", line 413, in async_turn_on result = await self._on_off_cluster_handler.on() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 84, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit self.gen.throw(value) File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 76, in wrap_zigpy_exceptions raise ZHAException(message) from exc zha.exceptions.ZHAException: Failed to send request: Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/script/init.py", line 707, in _async_run return await self.script.async_run(script_vars, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1795, in async_run return await asyncio.shield(create_eager_task(run.async_run())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 463, in async_run await self._async_step(log_exceptions=False) File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 527, in _async_step self._handle_exception( File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 557, in _handle_exception raise exception File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 525, in _async_step await getattr(self, handler)() File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1074, in _async_if_step await self._async_run_script(if_data["if_then"]) File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1268, in _async_run_script result = await self._async_run_long_action( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action return await long_task ^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1795, in async_run return await asyncio.shield(create_eager_task(run.async_run())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 463, in async_run await self._async_step(log_exceptions=False) File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 527, in _async_step self._handle_exception( File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 557, in _handle_exception raise exception File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 525, in _async_step await getattr(self, handler)() File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step response_data = await self._async_run_long_action( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action return await long_task ^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/core.py", line 2761, in async_call response_data = await coro ^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/core.py", line 2804, in _execute_service return await target(service_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 996, in entity_service_call single_response = await _handle_entity_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1068, in _handle_entity_call result = await task ^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/light/init.py", line 626, in async_handle_light_on_service await light.async_turn_on(**filter_turn_on_params(light, params)) File "/usr/src/homeassistant/homeassistant/components/zha/helpers.py", line 1291, in handler raise HomeAssistantError(err) from err homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>
@i8nemo Please attach both pieces of diagnostics information, I showed in the screenshot where you can download it: https://github.com/home-assistant/core/issues/124516#issuecomment-2364398602
home-assistant_zha_2024-09-21T23-49-26.017Z.zip
Both files in Zip
This error came back for me today as well. Worked two weeks straight no problem and then received it. No changes to Core or OS.
I encountered this error on a fresh HA OS installation, with a Sonoff USB Zigbee adapter. Devices automatically discovered and added to the network would display in HA, but give the above error when issuing commands.
Removed devices, and re-added, issue no longer appears. Not sure if it's related to that but I figured I'd throw it out there.
Hi there , is any troubleshooting / analysis action running?
@stegitto i had issues with ikea outlets and the usb repeaters where they lock up and refuse to route traffic. If you truly want a resolution please humor me and remove them from your network for a day or two and let's see if your issues go away. I also noticed in your config entry diagnostics that the EZSP counters seem to show various failures without many successes but that is just from a quick peek on mobile. Personally, I'd start with removing the IKEA devices and see if the issues clear up. I know this isn't something folks like to hear but it is worth trying. It's also good to understand that the only control we have is asking the coordinator to send a message for us. After that it's basically all out of our control. There isn't much we can do to influence things...
Hi dmulcahey. Thank you for your message. I will try. I’ve been using the same devices for two years, I cannot figure out what is changed..
Sometimes the software stack on the routers can just lock up... you can try a less drastic approach first: pull the power from all the ikea devices for like 45s then put it back. Give it a bit and see if things improve.
I tried almost everything. Downgrades, resets and reconnects, I’m not able to identify the problem. Many people started struggling during the past month, several of them with ZHA and Zigbee USB dongles. I tried to downgrade HA and to upgrade the coordinator firmware. Unplugging some devices for days and then back on, let the zigbee network run only for some hours. The ZHA visualization map is always green during the “network crash”. Nearest WiFi channel used is 7. No USB3 ports, I use a long extension cable. Moved from Pi4 to Proxmox. No way.
Dude, if you want help... try what I am suggesting. If you wanna rehash your frustration continuously I can't help you. I get that issues stink. I have a 200 device network working just fine and there are MANY other users without issues. If you look at the analytics there are more than 70k ZHA users: https://analytics.home-assistant.io/integrations/. There are not 70k folks reporting issues. Unfortunately, Zigbee is not a perfect technology and there are individual circumstances that cause individual issues. We do our best to help folks.
I thought doing a recap could help. Now I understand I’m too old to keep playing with these toys. Goodbye.
I have a 200 device network working just fine and there are MANY other users without issues.
Perhaps you could give some guidance on what type of Zigbee devices you're utilizing without error and using what kind of Zigbee Coordinator.
I have a 200 device network working just fine and there are MANY other users without issues.
Perhaps you could give some guidance on what type of Zigbee devices you're utilizing without error and using what kind of Zigbee Coordinator.
Host is a HA Blue
Coordinator is a TubeZb SI MGM24 connected via USB
Devices are a mix of inovelli blue switches, Philips Hue bulbs, centralite sensors and Aqara sensors and ~24 Third Reality energy monitoring outlets.
IMO router devices are the most important consideration after the coordinator. Some devices just don't route well.
I'll also never add a Tuya device to my network but that is my personal preference.
I also limit custom components to ones that I have proven don't cause issues with the HA event loop. Easy way to rule them out is to run HA in safe mode for a while.
I thought doing a recap could help. Now I understand I’m too old to keep playing with these toys. Goodbye.
Sorry if I misunderstood the "No way" comment you ended your reply with. I'm not trying to make you quit... just trying to explain that even though it feels like you have done everything doesn't mean there aren't more things to explore. In the thread you shared from Reddit there is even a user who pointed out the IKEA issue...
Device mix on the network (especially routing devices) is one of the most important factors for stability. There are a lot of devices on the market and some just don't perform well.
I'm not guaranteeing that this is the cause of your issue but it's an avenue worth exploring. With the time investment you have already committed I'd think you'd be willing to try this... and by your own account your devices are one of the few constants across your attempts to remediate your issue.
IMO router devices are the most important consideration after the coordinator.
Pardon the ignorance, but what do you mean by router devices? I also have never heard of that coordinator. Better than my Sonoff?
Router devices are generally mains powered devices that in addition to their user facing functionality also help route traffic for other devices on the network and they can also act as parents to end devices. Lots of bulbs, smart sockets, etc are routers. The device type can be seen on the device page in HA if you expand the zigbee details.
Coordinator is a TubeZb SI MGM24 connected via USB
So you have one of the rarest out of stock devices out there and have no issues. Yeah, that's not going to help. BTW, another 3074 error tonight on a powered switch that never gave me errors before. Makes sense.
Coordinator is a TubeZb SI MGM24 connected via USB
So you have one of the rarest out of stock devices out there and have no issues. Yeah, that's not going to help. BTW, another 3074 error tonight on a powered switch that never gave me errors before. Makes sense.
Honestly, I don't know how to help at this point. I get folks are frustrated but the commentary isn't helpful to anyone.
Just to give you another data point: I also have had this network running since ZHA was introduced and I ran it on a HUSBZB-1 for years... I know that won't make folks feel better but it is what it is.
In the visualization what is that switch connected to? For the router devices it shows connections to go to the device page, launch the manage device dialog, select the neighbors tab then click the title of the dialog so that the relationship column becomes available. See if you can determine what its parent is.
In the visualization what is that switch connected to? For the router devices it shows connections to go to the device page, launch the manage device dialog, select the neighbors tab then click the title of the dialog so that the relationship column becomes available. See if you can determine what its parent is.
Thank you. Okay, so I now know the parent for the router device (a plug-in switch). It has an LQI of 57. What's next?
And what's the model of the router (plug-in switch)? Also, please post your ZHA integration diagnostics.
And what's the model of the router (plug-in switch)? Also, please post your ZHA integration diagnostics.
It's a S31 Lite zb by SONOFF. One of these - https://www.amazon.com/gp/product/B08Y87WD1X/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
I never (and I repeat never) had a 3074 error with this particular switch and it just popped up out of no where for an animation that was set to kick on at 1:00 a.m. early this morning.
My diagnostics are attached.
Nothing?
I also have the same issue :-(. The error appear without reason. I didn’t add any device or whatever.
Here I tried to switch off « escaliers » and error. And sometime no problem :-/ sometimes it’s others devices
Hi, I am experiencing the same kind of problems. Important note: Logic and reading of this and other threads suggests to me that it is actually NOT a zigpy / zha problem. So, actually off-topic. But since everything started to have problems since 2024.9.x (currently on 2024.10.0), I'm still posting it here. Just in case there still might be some change in recent zigpy/zha that could facilitate the issues.
I started to experience many 3074 problems throughout the network from 2024.9.x. Before, it was a stable network ; but - coincidentally with the September updates and the addition of 2 new Zigbee power plugs (bringing the total of my ZHA devices to 97), I started to get a lot of disconnected Devices.
I can't really tell if the problems are in ZHA or if this is just coincidental.
Right now, removing a few devices, things look a little bit more stable (the coordinator is a nabu casa, the one from HA), but particularly the TRVs (thermostats) would not respond to commands. Still, they do report their state correctly, also when turning their wheels. Following @dmulcahey 's tips, I went "after" my IKEA repeaters (TRADFRI Signal Repeater) and (instead of removing them or re-pairing them), went to the Device page in HA and clicked "reconfigure".
This did bring back (although delayed) control functionality on all my tuya TRVs, but did not so on my 2 Aqara. I do believe that the problem is more likely to be one in my radio/zigbee/device choice, but the strange thing is, that it all used to work very nicely before and suddenly became a debug nightmare (and partly, still is; some motion sensors for example will disconnect from time to time and need to be power cycled or re-paired to get "back in").
I'm attaching one of the log entries with stack trace, unsure whether it's just the expected output when devices do not "react".
I'm open to any suggestions / questions...
home-assistant_zha_2024-09-21T23-49-26.017Z.zip
Both files in Zip
Its hard to accept that this is all attributed to RF interference - Given all the different reports where everything is stable over long period of time.
Its hard to accept that this is all attributed to RF interference - Given all the different reports where everything is stable over long period of time.
Exactly. The problem clearly concerns end devices of all kinds (motion sensors, trvs, switches, knobs..). Mains powered devices seem to be unaffected fwics. I also find it strange that the Aqara/Brennstuhl TRVs (with or without custom quirk - lumi.airrtc.agl001) take a long time to pair (sometimes they do not get past the “Interview” Phase, and when they do, the “Configuration” phase takes up to 3 minutes), and afterwards they don’t accept commands via zigbee (at least 99% of the time they priduce a 3074), but still report their sensors and state correctly to HA. Pairing and commands used to be almost instant.
I’m seeing also many log entries concerning units of sensor values.
I Wonder wether zigbee2mqtt should actually be favored over zha in terms of stability and scalability. Not that I’m looking forward to re-pair 97 devices… Also, the zha network graph is almost completely useless for debugging with around 100 devices.
I Wonder wether zigbee2mqtt should actually be favored over zha in terms of stability and scalability. Not that I’m looking forward to re-pair 97 devices… Also, the zha network graph is almost completely useless for debugging with around 100 devices.
You bring up some good points. I've watched a lot of videos on the difference between ZHA and zigbee2mqtt, but no one really has come out and said that I stopped getting a particular error when I made the switch (i.e., it might be a complete waste of time). I've also seen some videos where switching to ZHA from zigbee2mqtt was actually better because of its "built-in" integration.
Lastly, I agree with you about the ZHA graph. Completely worthless. A bunch of colored lines that provide me with little no directional value unless perhaps one device is sitting off screen by itself.
Alright all. Just updating here my status. Spoiler: It looks more like a router device congestion (if that's possible...: too many devices talking to the coordinator), than an update/bug issue.
1) Reduced my devices down to 90 from 97 (removed a few of the nous powerplugs, replaced by a multi-plug), and removed one additional IKEA repeater. Also removed devices that I had previously just switched off. This brought up almost everything again (after re-pairing 5-6 end devices) 2) Bought and Switched Radio from the Nabu Casa to SLAE.sh (Silicon Labs slae.sh CC2652RB Stick), since I had some friends report that this chip works better and has more capacity (also has an antenna, which at least psychologically can help to believe it's better). Radio migration was seemless (kudos, ha/zha team...!). Can't tell yet whether things will keep on being better mid/long term. But after migration, and a restart, all my TRVs respond quickly now. And most of all, my wife isn't complaining all the time that it's cold or that nothing works and I should remove all the smart stuff and return to a "working home".
So, from my part, I'd say: unrelated to the update - all points to just a coincidence.
Notes
Best Regards to All
One last contribution here, may it help others with the same kind of issues (@stegitto: don't lose faith, it can be solved.. although it can be quite frustrating while getting there..!). My lessons learned:
@dmulcahey: I just realized that you're not only zigpy contributor, but also the author of both the network card and the visualization :D . After this debugging-week, I've come to appreciate a lot of your work. But also to see some limits. May I suggest a few things for the next time You get around updating the graph..
1) Please display device names also at broader zoom levels (possibly hard-wrapping them when too long..)? 2) When clicking on a device, its connections are made thicker, but this doesn't really help with so many devices. It's still line-land. It would be better to hide all other lines altogether (or make them way more transparent); 3) Would it be possible to visually isolate the path of a device to the coordinator (show only the lines of it to its routers and from those, to the coordinator)? 4) An option to see the connections of a device not only as lines, but as an info box (e.g. an "i" on the device box populating an inspector-like box with a list of devices and signal strengths) 5) Why not integrate the zha-network-card (or a successor/sibling) as a panel of the network view (as the "list view" of it..)? 6) (I'm sure You know): Sometimes it gets completely crazy unusable, even with Chrome... (on any other browser performance is too bad anyways...)
Even just 1 and 2 would make it an order of magnitude more useful.. and thanx anyway for everything You've been doing! I love the whole structure and code. Maybe when my family will be less time-intensive I might get some hands-on it too...
Hi everybody. Thank you for the contribution to the discussion. Despite my 25 years in IT with focus on networking I was not able to isolate the root cause and fix the issue. I admit, the constant lack of time is a big constraint. Still believe there is more than one problem occurring at he same time, driving me to weird/unsuccessful troubleshooting results. I decommissioned the coordinator, the routers and all the sensors for a total of more than 30 devices, with 5 brand new endpoints still wating on the shelf to be added to the ZHA network. Then I spent one trillion euros to buy a complete Hue ecosystem, and I rebuilt the automations based on the Hue Bridge integration in HA. Everything is running fine since weeks now. Unfortunately, this is the only way I‘ve been able to find for a stable environment. I know my workaround is not going to help anybody, but I wanted to share my personal exit trategy after firing up the thread. Regards, Stefano.
Hi everybody. Thank you for the contribution to the discussion.
Hi Stefano, thanks for reporting back anyways. And: I was apparently too quick in calling it a victory. It all just began to get quite unstable again, and all without much information in the logs that could somehow help. Since the coordinator change from the Nabu Casa to the slae.sh it is at least mostly usable (at least after a restart), but just today, after unsuccesfully trying to pair a simple "styrbar" ikea remote, many other devices started to get crazy again. Some change done in September must have broken something. Not necessarily in zha. Could also be something else. For the records, the most recent thing I had was that I had to restart to see /any/ zha_event in the developer tools.. At this point I can just offer my availability for tests or reports of sort. So: Not solved, even if the lessons learned do apply.
Hi, there. I also got this error. The bad is that this error blocks further actions in my automations. So I just grabbed me a rpi2 and setup a serial network adapter with my dongle-e far away from my router and without any usb3 port. But the error comes back. So I decide to switch to an SLZB-06 without any success.
Sometimes at night I got this error and no zigbee call went out until I'd restarted the ZHA integration in the morning. So this The energy level of my chosen channel 20 was also around 20-40%.
Now I switched to z2m without any issues atm. Also the log is without any error since I'd switched to z2m. I can't believe this is IR related since now everything works flawlessly again and I isolated my hardware as much as I can.
@panhans What channel is your Z2M network on?
@puddly I leave it at channel 20 just to be sure that there isn't any other signal that causes IF. And every energy scan said me to stay on it. ;)
//EDIT: Zigbee Network still stable and responsive without any issues since 4 days.
The problem
Never had an issue and now I'm getting the following error on various Zigbee devices - Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>
Referring to: https://github.com/home-assistant/core/issues/124516
What version of Home Assistant Core has the issue?
core-2024.9.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
ZHA Zigbee
Link to integration documentation on our website
No response
Diagnostics information
home-assistant_zha_2024-09-20T17-37-58.067Z.log
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
Hi All
Tried to downgrade to core-2024.8.2 and core-2024.8.1, same issue. Zigbee coordinator: SONOFF Zigbee 3.0 USB Dongle Plus V2. Coordinator firmware release 7.4.4 ZHA Visualization reports green devices while there is no way to control them. Using channel 25 without RF interference.
Stefano