Closed Californian closed 8 months ago
Atualizei o meu e ocorreu a mesma coisa, voltei o backup parou tudo por aqui, utilizo ZHA + /dev/serial/by-id/usb-ITEAD_SONOFF_Zigbee_3.0_USB_Dongle_Plus_V2_20220718181812-if00
I disabled thread, openthreadborderrouter, and matter and enabled them after boot and now it seems to work more reliably.
This is still an issue. Enabling matter, otbr, and thread reliably causes zha to fail; I cannot use any of my Matter devices with HA.
Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
zha documentation zha source (message by IssueLinks)
Can you describe what Thread and Zigbee devices you have and if there is any procedure you can do to trigger this crash?
I have ~100 zigbee devices, mostly hue bulbs, smart plugs, and battery-powered buttons (I'd estimate half of the devices are routers).
I have an Aqara P2 door sensor to test out Matter-over-thread.
I can reliably trigger this by:
I can reliably resolve it after triggering by:
Thanks.
When you restart the Pi, do you completely power cycle it (i.e. shut off power)? Or just a software reboot?
Thanks.
When you restart the Pi, do you completely power cycle it (i.e. shut off power)? Or just a software reboot?
The last couple times I've just been restarting it from the "dev tools" -> "restart" -> "reboot system" UI, not a full power cycle (though I've power cycled in the past when some issue, maybe this one, prevented startup or shutdown).
I am experiencing the same issue with:
I had successfully configured and used ZHA with the multiprotocol add-on and it was talking to my devices. Then it randomly stopped working. At first I noticed that it was not even recognizing the dongle in the list of available serial devices according to ZHA — a full system reboot fixed that. Then both OTBR and ZHA would fail to start. After trying to reconfigure and restart things a couple of times, OTBR seems happy but the ZHA integration was still showing an error (it unhelpfully told me to check the logs, but did not say where / which logs to check).
Going deeper down the rabbit hole, I tried deleting the ZHA integration and reconfiguring it. Now I see a generic "Unknown error occurred" message:
The raw HTTP response is not helpful either:
500 Internal Server Error
Server got itself in trouble
The silabs multiprotocol add-on itself doesn't look too unhappy in its logs:
...
Listening on port 9999 for connection...
Accepting connection.
Accepted connection 7.
...
However in home-assistant.log I see:
2023-11-08 23:52:10.785 DEBUG (MainThread) [zigpy.serial] Opening a serial connection to 'socket://core-silabs-multiprotocol:9999' (115200 baudrate)
2023-11-08 23:52:10.787 ERROR (MainThread) [aiohttp.server] Error handling request
...long traceback...
ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)
Inside the "Terminal & SSH" terminal I am able to ping 172.30.32.1 but cannot connect on port 9999:
[core-ssh ~]$ nslookup core-silabs-multiprotocol
Server: 127.0.0.11
Address: 127.0.0.11#53
Name: core-silabs-multiprotocol
Address: 172.30.32.1
[core-ssh ~]$ busybox-extras telnet 172.30.32.1 9999
telnet: can't connect to remote host (172.30.32.1): Connection refused
[core-ssh ~]$ ping 172.30.32.1
PING 172.30.32.1 (172.30.32.1): 56 data bytes
64 bytes from 172.30.32.1: seq=0 ttl=64 time=0.041 ms
64 bytes from 172.30.32.1: seq=1 ttl=64 time=0.059 ms
^С
--- 172.30.32.1 ping statistics -
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.041/0.050/0.059 ms
More complete logs from this point in time here
After rebooting again, I got past the "Unknown error occurred" and was able to add the integration again. Sadly I now have to rename my devices that have reset to default names, but at least it is working again.
Strangely I now see a "discovered" integration option alongside the one I already set up (I didn't see this before):
I do see some odd things in the multiprotocol add-on logs, but I'm not sure if any of these are actually problematic:
s6-rc: info: service legacy-services successfully started
Listening on port 9999 for connection...
Accepting connection.
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25
otbr-agent[299]: 00:00:25.497 [N] Mle-----------: RLOC16 3c00 -> fffe
otbr-agent[299]: 00:00:25.499 [W] Platform------: [netif] Failed to process request#5: Unknown error -95
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25
otbr-agent[299]: 00:00:26.080 [N] Mle-----------: Attach attempt 1, AnyPartition reattaching with Active Dataset
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25
otbr-agent[299]: 00:00:32.580 [N] RouterTable---: Allocate router id 15
otbr-agent[299]: 00:00:32.580 [N] Mle-----------: RLOC16 fffe -> 3c00
otbr-agent[299]: 00:00:32.582 [N] Mle-----------: Role detached -> leader
otbr-agent[299]: 00:00:32.583 [N] Mle-----------: Partition ID 0x46464847
otbr-agent[299]: 00:00:32.595 [W] Platform------: [netif] Failed to process request#6: Unknown error -17
otbr-agent[299]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary!
Accepted connection 7.
Socket connection has been closed, restarting...
Listening on port 9999 for connection...
Accepting connection.
Accepted connection 7.
Restarting
[00:09:35:788202] Info : Endpoint socket #12: Client disconnected. 1 connections
[00:09:35:788223] Info : Client disconnected
[00:09:36:788913] Info : New client connection using library v4.3.1.0
[00:09:36:791482] Info : Endpoint socket #12: Client connected. 2 connections
Reusing socket from previous instance.
Socket connection has been closed, restarting...
Listening on port 9999 for connection...
Accepting connection.
Accepted connection 8.
Restarting
[00:09:46:931643] Info : Endpoint socket #12: Client disconnected. 1 connections
[00:09:46:931708] Info : Client disconnected
[00:09:47:933246] Info : New client connection using library v4.3.1.0
[00:09:47:935584] Info : Endpoint socket #12: Client connected. 2 connections
Reusing socket from previous instance.
Thanks for coming to my TED talk. I will report back if it breaks again...
Other things I may try next time:
Based on what I'm reading elsewhere, it sounds like the multiprotocol stuff may just not be working all that well and perhaps I should go back to the single protocol firmware if I want better reliability.
I am experiencing the same issue with:
- Sonoff Zigbee Dongle E flashed with multiprotocol firmware
- 3 Zigbee devices (2 philips hue bulbs + 1 inovelli blue switch)
- OTBR and Matter/Thread are configured, but not used (I previously tried to set up a nanoleaf matter bulb and was unsuccessful, and gave up)
I had successfully configured and used ZHA with the multiprotocol add-on and it was talking to my devices. Then it randomly stopped working. At first I noticed that it was not even recognizing the dongle in the list of available serial devices according to ZHA — a full system reboot fixed that. Then both OTBR and ZHA would fail to start. After trying to reconfigure and restart things a couple of times, OTBR seems happy but the ZHA integration was still showing an error (it unhelpfully told me to check the logs, but did not say where / which logs to check).
Going deeper down the rabbit hole, I tried deleting the ZHA integration and reconfiguring it. Now I see a generic "Unknown error occurred" message:
The raw HTTP response is not helpful either:
500 Internal Server Error Server got itself in trouble
The silabs multiprotocol add-on itself doesn't look too unhappy in its logs:
... Listening on port 9999 for connection... Accepting connection. Accepted connection 7. ...
However in home-assistant.log I see:
2023-11-08 23:52:10.785 DEBUG (MainThread) [zigpy.serial] Opening a serial connection to 'socket://core-silabs-multiprotocol:9999' (115200 baudrate) 2023-11-08 23:52:10.787 ERROR (MainThread) [aiohttp.server] Error handling request ...long traceback... ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)
Inside the "Terminal & SSH" terminal I am able to ping 172.30.32.1 but cannot connect on port 9999:
[core-ssh ~]$ nslookup core-silabs-multiprotocol Server: 127.0.0.11 Address: 127.0.0.11#53 Name: core-silabs-multiprotocol Address: 172.30.32.1 [core-ssh ~]$ busybox-extras telnet 172.30.32.1 9999 telnet: can't connect to remote host (172.30.32.1): Connection refused [core-ssh ~]$ ping 172.30.32.1 PING 172.30.32.1 (172.30.32.1): 56 data bytes 64 bytes from 172.30.32.1: seq=0 ttl=64 time=0.041 ms 64 bytes from 172.30.32.1: seq=1 ttl=64 time=0.059 ms ^С --- 172.30.32.1 ping statistics - 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.041/0.050/0.059 ms
More complete logs from this point in time here
After rebooting again, I got past the "Unknown error occurred" and was able to add the integration again. Sadly I now have to rename my devices that have reset to default names, but at least it is working again.
Strangely I now see a "discovered" integration option alongside the one I already set up (I didn't see this before):
I do see some odd things in the multiprotocol add-on logs, but I'm not sure if any of these are actually problematic:
s6-rc: info: service legacy-services successfully started Listening on port 9999 for connection... Accepting connection. Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25 otbr-agent[299]: 00:00:25.497 [N] Mle-----------: RLOC16 3c00 -> fffe otbr-agent[299]: 00:00:25.499 [W] Platform------: [netif] Failed to process request#5: Unknown error -95 Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25 otbr-agent[299]: 00:00:26.080 [N] Mle-----------: Attach attempt 1, AnyPartition reattaching with Active Dataset Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25 Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25 Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25 Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::804a:bfff:fe2a:cfdc/veth2790237/25 otbr-agent[299]: 00:00:32.580 [N] RouterTable---: Allocate router id 15 otbr-agent[299]: 00:00:32.580 [N] Mle-----------: RLOC16 fffe -> 3c00 otbr-agent[299]: 00:00:32.582 [N] Mle-----------: Role detached -> leader otbr-agent[299]: 00:00:32.583 [N] Mle-----------: Partition ID 0x46464847 otbr-agent[299]: 00:00:32.595 [W] Platform------: [netif] Failed to process request#6: Unknown error -17 otbr-agent[299]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary! Accepted connection 7. Socket connection has been closed, restarting... Listening on port 9999 for connection... Accepting connection. Accepted connection 7. Restarting [00:09:35:788202] Info : Endpoint socket #12: Client disconnected. 1 connections [00:09:35:788223] Info : Client disconnected [00:09:36:788913] Info : New client connection using library v4.3.1.0 [00:09:36:791482] Info : Endpoint socket #12: Client connected. 2 connections Reusing socket from previous instance. Socket connection has been closed, restarting... Listening on port 9999 for connection... Accepting connection. Accepted connection 8. Restarting [00:09:46:931643] Info : Endpoint socket #12: Client disconnected. 1 connections [00:09:46:931708] Info : Client disconnected [00:09:47:933246] Info : New client connection using library v4.3.1.0 [00:09:47:935584] Info : Endpoint socket #12: Client connected. 2 connections Reusing socket from previous instance.
Thanks for coming to my TED talk. I will report back if it breaks again...
Other things I may try next time:
- disabling OTBR since I'm not using it
- if that doesn't work, re-flashing multiprotocol firmware on the dongle
- if that doesn't work, re-flashing back to the single protocol zigbee firmware :(
Based on what I'm reading elsewhere, it sounds like the multiprotocol stuff may just not be working all that well and perhaps I should go back to the single protocol firmware if I want better reliability.
I disabled just OTBR and now it's working for me. Obviously that's not a real solution long-term.
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
The problem
Upon restart, and maybe at other times, ZHA fails to start repeatedly. Sometimes, this causes HA to crash entirely, possibly even with a reboot. This issue started when I started using silabs-multiprotocol and, specifically, started using the border router and a thread/matter device. I'm using the SkyConnect on a 1-2 m USB extension cord, far from any major RF interference.
After restarting a few times, it works sometimes. This used to be a reliable fix, but upon upgrading to 2023.10 today, it seems to fail almost every time, and after a period of time.
What version of Home Assistant Core has the issue?
core-2023.10.0
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
ZHA
Link to integration documentation on our website
No response
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
No response