home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74k stars 31.05k forks source link

Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074> (not for RF interference) #126364

Open stegitto opened 2 months ago

stegitto commented 2 months ago

The problem

Never had an issue and now I'm getting the following error on various Zigbee devices - Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>

Referring to: https://github.com/home-assistant/core/issues/124516

What version of Home Assistant Core has the issue?

core-2024.9.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA Zigbee

Link to integration documentation on our website

No response

Diagnostics information

home-assistant_zha_2024-09-20T17-37-58.067Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

Hi All

Tried to downgrade to core-2024.8.2 and core-2024.8.1, same issue. Zigbee coordinator: SONOFF Zigbee 3.0 USB Dongle Plus V2. Coordinator firmware release 7.4.4 ZHA Visualization reports green devices while there is no way to control them. Using channel 25 without RF interference.

Stefano

home-assistant[bot] commented 2 months ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

puddly commented 2 months ago

Please attach both pieces of diagnostics information, I showed in the screenshot where you can download it: https://github.com/home-assistant/core/issues/124516#issuecomment-2364398602

stegitto commented 2 months ago

home-assistant_zha_2024-09-20T21-47-32.100Z.log config_entry-zha-aa2696fe0a36266b98b7767201af0f90.json

New debug and new disgnostics attached. Thank you.

i8nemo commented 2 months ago

Same issue being experienced across multiple devices. Nothing has been added to or changed in the ZHA config. Same error with an error log of: Logger: homeassistant Source: components/zha/helpers.py:1291 First occurred: 07:14:45 (1 occurrences) Last logged: 07:14:45

Error doing job: Task exception was never retrieved (None) Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 67, in wrap_zigpy_exceptions yield File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 85, in wrapper return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/util.py", line 136, in retry return await func() ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/quirks/init.py", line 254, in command return await self.request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/init.py", line 378, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 265, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 339, in request await send_request() File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 834, in request await self.send_packet( File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 827, in send_packet raise zigpy.exceptions.DeliveryError( zigpy.exceptions.DeliveryError: Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/helpers.py", line 1289, in handler return await func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 181, in async_turn_on await self.entity_data.entity.async_turn_on( File "/usr/local/lib/python3.12/site-packages/zha/application/platforms/light/init.py", line 413, in async_turn_on result = await self._on_off_cluster_handler.on() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 84, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit self.gen.throw(value) File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/init.py", line 76, in wrap_zigpy_exceptions raise ZHAException(message) from exc zha.exceptions.ZHAException: Failed to send request: Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/script/init.py", line 707, in _async_run return await self.script.async_run(script_vars, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1795, in async_run return await asyncio.shield(create_eager_task(run.async_run())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 463, in async_run await self._async_step(log_exceptions=False) File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 527, in _async_step self._handle_exception( File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 557, in _handle_exception raise exception File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 525, in _async_step await getattr(self, handler)() File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1074, in _async_if_step await self._async_run_script(if_data["if_then"]) File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1268, in _async_run_script result = await self._async_run_long_action( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action return await long_task ^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1795, in async_run return await asyncio.shield(create_eager_task(run.async_run())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 463, in async_run await self._async_step(log_exceptions=False) File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 527, in _async_step self._handle_exception( File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 557, in _handle_exception raise exception File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 525, in _async_step await getattr(self, handler)() File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step response_data = await self._async_run_long_action( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action return await long_task ^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/core.py", line 2761, in async_call response_data = await coro ^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/core.py", line 2804, in _execute_service return await target(service_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 996, in entity_service_call single_response = await _handle_entity_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1068, in _handle_entity_call result = await task ^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/light/init.py", line 626, in async_handle_light_on_service await light.async_turn_on(**filter_turn_on_params(light, params)) File "/usr/src/homeassistant/homeassistant/components/zha/helpers.py", line 1291, in handler raise HomeAssistantError(err) from err homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>

puddly commented 2 months ago

@i8nemo Please attach both pieces of diagnostics information, I showed in the screenshot where you can download it: https://github.com/home-assistant/core/issues/124516#issuecomment-2364398602

i8nemo commented 2 months ago

home-assistant_zha_2024-09-21T23-49-26.017Z.zip

Both files in Zip

ecchodun commented 2 months ago

This error came back for me today as well. Worked two weeks straight no problem and then received it. No changes to Core or OS.

AndrewUHD commented 2 months ago

I encountered this error on a fresh HA OS installation, with a Sonoff USB Zigbee adapter. Devices automatically discovered and added to the network would display in HA, but give the above error when issuing commands.

Removed devices, and re-added, issue no longer appears. Not sure if it's related to that but I figured I'd throw it out there.

stegitto commented 2 months ago

Hi there , is any troubleshooting / analysis action running?

stegitto commented 2 months ago

https://www.reddit.com/r/homeassistant/comments/1fcldu5/sl_statuszigbee_delivery_failed_3074/

dmulcahey commented 2 months ago

@stegitto i had issues with ikea outlets and the usb repeaters where they lock up and refuse to route traffic. If you truly want a resolution please humor me and remove them from your network for a day or two and let's see if your issues go away. I also noticed in your config entry diagnostics that the EZSP counters seem to show various failures without many successes but that is just from a quick peek on mobile. Personally, I'd start with removing the IKEA devices and see if the issues clear up. I know this isn't something folks like to hear but it is worth trying. It's also good to understand that the only control we have is asking the coordinator to send a message for us. After that it's basically all out of our control. There isn't much we can do to influence things...

stegitto commented 2 months ago

Hi dmulcahey. Thank you for your message. I will try. I’ve been using the same devices for two years, I cannot figure out what is changed..

dmulcahey commented 2 months ago

Sometimes the software stack on the routers can just lock up... you can try a less drastic approach first: pull the power from all the ikea devices for like 45s then put it back. Give it a bit and see if things improve.

stegitto commented 2 months ago

I tried almost everything. Downgrades, resets and reconnects, I’m not able to identify the problem. Many people started struggling during the past month, several of them with ZHA and Zigbee USB dongles. I tried to downgrade HA and to upgrade the coordinator firmware. Unplugging some devices for days and then back on, let the zigbee network run only for some hours. The ZHA visualization map is always green during the “network crash”. Nearest WiFi channel used is 7. No USB3 ports, I use a long extension cable. Moved from Pi4 to Proxmox. No way.

dmulcahey commented 2 months ago

Dude, if you want help... try what I am suggesting. If you wanna rehash your frustration continuously I can't help you. I get that issues stink. I have a 200 device network working just fine and there are MANY other users without issues. If you look at the analytics there are more than 70k ZHA users: https://analytics.home-assistant.io/integrations/. There are not 70k folks reporting issues. Unfortunately, Zigbee is not a perfect technology and there are individual circumstances that cause individual issues. We do our best to help folks.

stegitto commented 2 months ago

I thought doing a recap could help. Now I understand I’m too old to keep playing with these toys. Goodbye.

ecchodun commented 2 months ago

I have a 200 device network working just fine and there are MANY other users without issues.

Perhaps you could give some guidance on what type of Zigbee devices you're utilizing without error and using what kind of Zigbee Coordinator.

dmulcahey commented 2 months ago

I have a 200 device network working just fine and there are MANY other users without issues.

Perhaps you could give some guidance on what type of Zigbee devices you're utilizing without error and using what kind of Zigbee Coordinator.

Host is a HA Blue

Coordinator is a TubeZb SI MGM24 connected via USB

Devices are a mix of inovelli blue switches, Philips Hue bulbs, centralite sensors and Aqara sensors and ~24 Third Reality energy monitoring outlets.

IMO router devices are the most important consideration after the coordinator. Some devices just don't route well.

I'll also never add a Tuya device to my network but that is my personal preference.

I also limit custom components to ones that I have proven don't cause issues with the HA event loop. Easy way to rule them out is to run HA in safe mode for a while.

dmulcahey commented 2 months ago

I thought doing a recap could help. Now I understand I’m too old to keep playing with these toys. Goodbye.

Sorry if I misunderstood the "No way" comment you ended your reply with. I'm not trying to make you quit... just trying to explain that even though it feels like you have done everything doesn't mean there aren't more things to explore. In the thread you shared from Reddit there is even a user who pointed out the IKEA issue...

Device mix on the network (especially routing devices) is one of the most important factors for stability. There are a lot of devices on the market and some just don't perform well.

I'm not guaranteeing that this is the cause of your issue but it's an avenue worth exploring. With the time investment you have already committed I'd think you'd be willing to try this... and by your own account your devices are one of the few constants across your attempts to remediate your issue.

ecchodun commented 2 months ago

IMO router devices are the most important consideration after the coordinator.

Pardon the ignorance, but what do you mean by router devices? I also have never heard of that coordinator. Better than my Sonoff?

dmulcahey commented 2 months ago

Router devices are generally mains powered devices that in addition to their user facing functionality also help route traffic for other devices on the network and they can also act as parents to end devices. Lots of bulbs, smart sockets, etc are routers. The device type can be seen on the device page in HA if you expand the zigbee details.

ecchodun commented 2 months ago

Coordinator is a TubeZb SI MGM24 connected via USB

So you have one of the rarest out of stock devices out there and have no issues. Yeah, that's not going to help. BTW, another 3074 error tonight on a powered switch that never gave me errors before. Makes sense.

dmulcahey commented 2 months ago

Coordinator is a TubeZb SI MGM24 connected via USB

So you have one of the rarest out of stock devices out there and have no issues. Yeah, that's not going to help. BTW, another 3074 error tonight on a powered switch that never gave me errors before. Makes sense.

Honestly, I don't know how to help at this point. I get folks are frustrated but the commentary isn't helpful to anyone.

Just to give you another data point: I also have had this network running since ZHA was introduced and I ran it on a HUSBZB-1 for years... I know that won't make folks feel better but it is what it is.

In the visualization what is that switch connected to? For the router devices it shows connections to go to the device page, launch the manage device dialog, select the neighbors tab then click the title of the dialog so that the relationship column becomes available. See if you can determine what its parent is.

ecchodun commented 2 months ago

In the visualization what is that switch connected to? For the router devices it shows connections to go to the device page, launch the manage device dialog, select the neighbors tab then click the title of the dialog so that the relationship column becomes available. See if you can determine what its parent is.

Thank you. Okay, so I now know the parent for the router device (a plug-in switch). It has an LQI of 57. What's next?

TheJulianJES commented 2 months ago

And what's the model of the router (plug-in switch)? Also, please post your ZHA integration diagnostics.

ecchodun commented 2 months ago

And what's the model of the router (plug-in switch)? Also, please post your ZHA integration diagnostics.

It's a S31 Lite zb by SONOFF. One of these - https://www.amazon.com/gp/product/B08Y87WD1X/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

I never (and I repeat never) had a 3074 error with this particular switch and it just popped up out of no where for an animation that was set to kick on at 1:00 a.m. early this morning.

My diagnostics are attached.

config_entry-zha-bee6cccfaff10b502497db0b3db8d06e.json

ecchodun commented 1 month ago

Nothing?

Auka84 commented 1 month ago

I also have the same issue :-(. The error appear without reason. I didn’t add any device or whatever.

Here I tried to switch off « escaliers » and error. And sometime no problem :-/ sometimes it’s others devices IMG_7569

lopezio commented 1 month ago

Hi, I am experiencing the same kind of problems. Important note: Logic and reading of this and other threads suggests to me that it is actually NOT a zigpy / zha problem. So, actually off-topic. But since everything started to have problems since 2024.9.x (currently on 2024.10.0), I'm still posting it here. Just in case there still might be some change in recent zigpy/zha that could facilitate the issues.

I started to experience many 3074 problems throughout the network from 2024.9.x. Before, it was a stable network ; but - coincidentally with the September updates and the addition of 2 new Zigbee power plugs (bringing the total of my ZHA devices to 97), I started to get a lot of disconnected Devices.

I can't really tell if the problems are in ZHA or if this is just coincidental.

Right now, removing a few devices, things look a little bit more stable (the coordinator is a nabu casa, the one from HA), but particularly the TRVs (thermostats) would not respond to commands. Still, they do report their state correctly, also when turning their wheels. Following @dmulcahey 's tips, I went "after" my IKEA repeaters (TRADFRI Signal Repeater) and (instead of removing them or re-pairing them), went to the Device page in HA and clicked "reconfigure".

This did bring back (although delayed) control functionality on all my tuya TRVs, but did not so on my 2 Aqara. I do believe that the problem is more likely to be one in my radio/zigbee/device choice, but the strange thing is, that it all used to work very nicely before and suddenly became a debug nightmare (and partly, still is; some motion sensors for example will disconnect from time to time and need to be power cycled or re-paired to get "back in").

I'm attaching one of the log entries with stack trace, unsure whether it's just the expected output when devices do not "react".

I'm open to any suggestions / questions...

``` Logger: homeassistant.components.websocket_api.http.connection Source: components/websocket_api/commands.py:245 integration: Home Assistant WebSocket API (documentation, issues) First occurred: 1:18:49 AM (7 occurrences) Last logged: 11:52:05 AM [546054599392] Unexpected exception [546126839472] Unexpected exception Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 67, in wrap_zigpy_exceptions yield File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 85, in wrapper return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/util.py", line 136, in retry return await func() ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zhaquirks/tuya/__init__.py", line 776, in write_attributes await self.endpoint.tuya_manufacturer.write_attributes( File "/usr/local/lib/python3.12/site-packages/zhaquirks/tuya/__init__.py", line 521, in write_attributes await super().command( File "/usr/local/lib/python3.12/site-packages/zigpy/quirks/__init__.py", line 254, in command return await self.request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/__init__.py", line 378, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 265, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 334, in request await send_request() File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 834, in request await self.send_packet( File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 827, in send_packet raise zigpy.exceptions.DeliveryError( zigpy.exceptions.DeliveryError: Failed to deliver message: The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/helpers.py", line 1286, in handler return await func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/climate.py", line 230, in async_set_temperature await self.entity_data.entity.async_set_temperature( File "/usr/local/lib/python3.12/site-packages/zha/application/platforms/climate/__init__.py", line 462, in async_set_temperature await self._thermostat_cluster_handler.async_set_heating_setpoint( File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/hvac.py", line 286, in async_set_heating_setpoint await self.write_attributes_safe({attr: temperature}) File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 614, in write_attributes_safe res = await self.write_attributes(attributes, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 84, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__ self.gen.throw(value) File "/usr/local/lib/python3.12/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 76, in wrap_zigpy_exceptions raise ZHAException(message) from exc zha.exceptions.ZHAException: Failed to send request: Failed to deliver message: The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 245, in handle_call_service response = await hass.services.async_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/core.py", line 2761, in async_call response_data = await coro ^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/core.py", line 2804, in _execute_service return await target(service_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 996, in entity_service_call single_response = await _handle_entity_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1068, in _handle_entity_call result = await task ^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/climate/__init__.py", line 1029, in async_service_temperature_set await entity.async_set_temperature(**kwargs) File "/usr/src/homeassistant/homeassistant/components/zha/helpers.py", line 1288, in handler raise HomeAssistantError(err) from err homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: ```
i8nemo commented 1 month ago

home-assistant_zha_2024-09-21T23-49-26.017Z.zip

Both files in Zip

Its hard to accept that this is all attributed to RF interference - Given all the different reports where everything is stable over long period of time.

lopezio commented 1 month ago

Its hard to accept that this is all attributed to RF interference - Given all the different reports where everything is stable over long period of time.

Exactly. The problem clearly concerns end devices of all kinds (motion sensors, trvs, switches, knobs..). Mains powered devices seem to be unaffected fwics. I also find it strange that the Aqara/Brennstuhl TRVs (with or without custom quirk - lumi.airrtc.agl001) take a long time to pair (sometimes they do not get past the “Interview” Phase, and when they do, the “Configuration” phase takes up to 3 minutes), and afterwards they don’t accept commands via zigbee (at least 99% of the time they priduce a 3074), but still report their sensors and state correctly to HA. Pairing and commands used to be almost instant.

I’m seeing also many log entries concerning units of sensor values.

I Wonder wether zigbee2mqtt should actually be favored over zha in terms of stability and scalability. Not that I’m looking forward to re-pair 97 devices… Also, the zha network graph is almost completely useless for debugging with around 100 devices.

ecchodun commented 1 month ago

I Wonder wether zigbee2mqtt should actually be favored over zha in terms of stability and scalability. Not that I’m looking forward to re-pair 97 devices… Also, the zha network graph is almost completely useless for debugging with around 100 devices.

You bring up some good points. I've watched a lot of videos on the difference between ZHA and zigbee2mqtt, but no one really has come out and said that I stopped getting a particular error when I made the switch (i.e., it might be a complete waste of time). I've also seen some videos where switching to ZHA from zigbee2mqtt was actually better because of its "built-in" integration.

Lastly, I agree with you about the ZHA graph. Completely worthless. A bunch of colored lines that provide me with little no directional value unless perhaps one device is sitting off screen by itself.

lopezio commented 1 month ago

Alright all. Just updating here my status. Spoiler: It looks more like a router device congestion (if that's possible...: too many devices talking to the coordinator), than an update/bug issue.

1) Reduced my devices down to 90 from 97 (removed a few of the nous powerplugs, replaced by a multi-plug), and removed one additional IKEA repeater. Also removed devices that I had previously just switched off. This brought up almost everything again (after re-pairing 5-6 end devices) 2) Bought and Switched Radio from the Nabu Casa to SLAE.sh (Silicon Labs slae.sh CC2652RB Stick), since I had some friends report that this chip works better and has more capacity (also has an antenna, which at least psychologically can help to believe it's better). Radio migration was seemless (kudos, ha/zha team...!). Can't tell yet whether things will keep on being better mid/long term. But after migration, and a restart, all my TRVs respond quickly now. And most of all, my wife isn't complaining all the time that it's cold or that nothing works and I should remove all the smart stuff and return to a "working home".

So, from my part, I'd say: unrelated to the update - all points to just a coincidence.

Notes

Best Regards to All

lopezio commented 1 month ago

One last contribution here, may it help others with the same kind of issues (@stegitto: don't lose faith, it can be solved.. although it can be quite frustrating while getting there..!). My lessons learned:

@dmulcahey: I just realized that you're not only zigpy contributor, but also the author of both the network card and the visualization :D . After this debugging-week, I've come to appreciate a lot of your work. But also to see some limits. May I suggest a few things for the next time You get around updating the graph..

1) Please display device names also at broader zoom levels (possibly hard-wrapping them when too long..)? 2) When clicking on a device, its connections are made thicker, but this doesn't really help with so many devices. It's still line-land. It would be better to hide all other lines altogether (or make them way more transparent); 3) Would it be possible to visually isolate the path of a device to the coordinator (show only the lines of it to its routers and from those, to the coordinator)? 4) An option to see the connections of a device not only as lines, but as an info box (e.g. an "i" on the device box populating an inspector-like box with a list of devices and signal strengths) 5) Why not integrate the zha-network-card (or a successor/sibling) as a panel of the network view (as the "list view" of it..)? 6) (I'm sure You know): Sometimes it gets completely crazy unusable, even with Chrome... (on any other browser performance is too bad anyways...)

Even just 1 and 2 would make it an order of magnitude more useful.. and thanx anyway for everything You've been doing! I love the whole structure and code. Maybe when my family will be less time-intensive I might get some hands-on it too...

stegitto commented 1 month ago

Hi everybody. Thank you for the contribution to the discussion. Despite my 25 years in IT with focus on networking I was not able to isolate the root cause and fix the issue. I admit, the constant lack of time is a big constraint. Still believe there is more than one problem occurring at he same time, driving me to weird/unsuccessful troubleshooting results. I decommissioned the coordinator, the routers and all the sensors for a total of more than 30 devices, with 5 brand new endpoints still wating on the shelf to be added to the ZHA network. Then I spent one trillion euros to buy a complete Hue ecosystem, and I rebuilt the automations based on the Hue Bridge integration in HA. Everything is running fine since weeks now. Unfortunately, this is the only way I‘ve been able to find for a stable environment. I know my workaround is not going to help anybody, but I wanted to share my personal exit trategy after firing up the thread. Regards, Stefano.

lopezio commented 1 month ago

Hi everybody. Thank you for the contribution to the discussion.

Hi Stefano, thanks for reporting back anyways. And: I was apparently too quick in calling it a victory. It all just began to get quite unstable again, and all without much information in the logs that could somehow help. Since the coordinator change from the Nabu Casa to the slae.sh it is at least mostly usable (at least after a restart), but just today, after unsuccesfully trying to pair a simple "styrbar" ikea remote, many other devices started to get crazy again. Some change done in September must have broken something. Not necessarily in zha. Could also be something else. For the records, the most recent thing I had was that I had to restart to see /any/ zha_event in the developer tools.. At this point I can just offer my availability for tests or reports of sort. So: Not solved, even if the lessons learned do apply.

panhans commented 1 month ago

Hi, there. I also got this error. The bad is that this error blocks further actions in my automations. So I just grabbed me a rpi2 and setup a serial network adapter with my dongle-e far away from my router and without any usb3 port. But the error comes back. So I decide to switch to an SLZB-06 without any success.

Sometimes at night I got this error and no zigbee call went out until I'd restarted the ZHA integration in the morning. So this The energy level of my chosen channel 20 was also around 20-40%.

Now I switched to z2m without any issues atm. Also the log is without any error since I'd switched to z2m. I can't believe this is IR related since now everything works flawlessly again and I isolated my hardware as much as I can.

puddly commented 1 month ago

@panhans What channel is your Z2M network on?

panhans commented 1 month ago

@puddly I leave it at channel 20 just to be sure that there isn't any other signal that causes IF. And every energy scan said me to stay on it. ;)

//EDIT: Zigbee Network still stable and responsive without any issues since 4 days.