home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.24k stars 29.24k forks source link

Zigbee Connection Loss with SkyConnect USB Stick on Home Assistant Green Box #109987

Closed wpformation closed 3 weeks ago

wpformation commented 5 months ago

The problem

Hi, I am experiencing an issue with the Zigbee Home Automation (ZHA) integration in Home Assistant, specifically when using the SkyConnect USB Stick. The error message indicates a lost serial connection with the following logs:

2024-02-08 11:03:05.897 ERROR (MainThread) [bellows.uart] Lost serial connection: ConnectionResetError('Remote server closed connection') 2024-02-08 11:03:05.898 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

Hardware: SkyConnect USB Stick Setup: Home Assistant Green Box with multiprotocol support

What version of Home Assistant Core has the issue?

core-2024.2.0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

System Information

version core-2024.2.0
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.1
os_name Linux
os_version 6.1.74-haos
arch aarch64
timezone Europe/Paris
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1461 Downloaded Repositories | 19 HACS Data | ok
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 16 janvier 2025 à 01:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | false remote_server | eu-central-1-2.ui.nabu.casa certificate_status | ready instance_id | 9b1354271d894ddab777f6bfe47bce87 can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.5 -- | -- update_channel | stable supervisor_version | supervisor-2024.01.1 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 28.0 GB disk_used | 7.0 GB healthy | true supported | true board | green supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.8.1), Home Assistant Google Drive Backup (0.112.1), Studio Code Server (5.15.0), Silicon Labs Multiprotocol (2.4.4), Mosquitto broker (6.4.0), Samba share (12.2.0)
Dashboards dashboards | 2 -- | -- resources | 21 views | 9 mode | storage
Recorder oldest_recorder_run | 7 février 2024 à 14:18 -- | -- current_recorder_run | 8 février 2024 à 10:37 estimated_db_size | 29.07 MiB database_engine | sqlite database_version | 3.44.2
Spotify api_endpoint_reachable | ok -- | --

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-02-08 11:03:05.898 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-02-08 11:03:06.031 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0xffff99584bf0>>
2024-02-08 11:03:41.296 WARNING (MainThread) [zigpy.application] Unknown device AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xF724)

Additional information

These errors occur repeatedly throughout the day. Each time this happens, the Zigbee coordinator (APP controller) enters a failed state, and a restart is automatically requested. The only noticeable consequence is that I lose connectivity with all end devices for approximately 5 to 10 seconds during each occurrence.

home-assistant[bot] commented 5 months ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

TheJulianJES commented 5 months ago

Do you need multi-protocol? It's considered experimental. If you want a stable Zigbee network, you should probably avoid using it at the moment and let ZHA directly communicate with the stick. Otherwise, multi-protocol logs might be helpful.

wpformation commented 5 months ago

I've tryed but :

Logger: zigpy.application Source: components/zha/core/gateway.py:214 First occurred: 17:33:44 (2 occurrences) Last logged: 17:33:44 Zigbee channel 11 utilization is 99.36%!

Same thing on channel 15, 20, 25.

That's why i switch on multiprotocol, where everything was fine for a few weeks.

Le sam. 10 févr. 2024, 21:39, TheJulianJES @.***> a écrit :

Do you need multi-protocol? It's considered experimental. If you want a stable Zigbee network, you should probably avoid using it at the moment and let ZHA directly communicate with the stick.

— Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/109987#issuecomment-1937115527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4U6NC3STHP5YEFKU5NHWTYS7LIRAVCNFSM6AAAAABC7QH456VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXGEYTKNJSG4 . You are receiving this because you authored the thread.Message ID: @.***>

Greminn commented 5 months ago

Hi There, I have the same thing and am also on multiprotocol (as I have a couple of thread devices as well as around 50 Zigbee devices). All Zigbee devices become unavailable and then back on again at that point. Looks like the Multiprotocol add on restarted causing the error?

Core 2024.2.1 Supervisor 2024.01.1 Operating System 11.5 Frontend 20240207.1

Logs from HA (note this happened 3 times this morning between 0 and 7am):

2024-02-15 06:14:21.838 ERROR (MainThread) [bellows.uart] Lost serial connection: ConnectionResetError('Remote server closed connection') 2024-02-15 06:14:21.839 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart 2024-02-15 06:14:21.944 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f1a58621130>>

Logs from Multiprotocol add on:

[06:14:29] INFO: Generating cpcd configuration. s6-rc: info: service cpcd-config successfully started s6-rc: info: service cpcd: starting [06:14:29] INFO: Starting cpcd... WARNING in function 'main' in file /usr/src/cpc-daemon/main.c at line #186 : Running CPCd as 'root' is not recommended. Proceed at your own risk. s6-rc: info: service cpcd successfully started s6-rc: info: service zigbeed: starting s6-rc: info: service otbr-agent: starting s6-rc: info: service zigbeed successfully started [06:14:29] INFO: Starting zigbeed... [06:14:29] INFO: Setup OTBR firewall... [06:14:29] INFO: Starting otbr-agent... otbr-agent[302]: [NOTE]-AGENT---: Running 0.3.0 otbr-agent[302]: [NOTE]-AGENT---: Thread version: 1.3.0 otbr-agent[302]: [NOTE]-AGENT---: Thread interface: wpan0 otbr-agent[302]: [NOTE]-AGENT---: Radio URL: spinel+cpc://cpcd_0?iid=2&iid-list=0 otbr-agent[302]: [NOTE]-ILS-----: Infra link selected: eno1 otbr-agent[302]: 58d.16:17:22.816 [C] Platform------: mCpcBusSpeed = 115200 [06:14:29:175429] Info : [CPCd v4.3.1.0] [Library API v3] [RCP Protocol v4] [06:14:29:175460] Info : Git commit: 133b29678b3d0bc7578e098d2f46b4d5bcd2ebb4 / branch: [06:14:29:175462] Info : Sources hash: ff8300587e7e4ab1def7a89a272c0baef32f9eb3bff9b0ba06b94e655d652367 [06:14:29:175463] WARNING : In function 'main' in file /usr/src/cpc-daemon/main.c at line #186 : Running CPCd as 'root' is not recommended. Proceed at your own risk. [06:14:29:175471] Info : Reading cli arguments [06:14:29:175473] Info : /usr/local/bin/cpcd [06:14:29:176509] Info : Reading configuration [06:14:29:176511] Info : file_path = /usr/local/etc/cpcd.conf [06:14:29:176512] Info : instance_name = cpcd_0 [06:14:29:176513] Info : socket_folder = /dev/shm [06:14:29:176513] Info : operation_mode = MODE_NORMAL [06:14:29:176514] Info : use_encryption = false [06:14:29:176515] Info : binding_key_file = /etc/binding-key.key [06:14:29:176515] Info : stdout_tracing = false [06:14:29:176516] Info : file_tracing = false [06:14:29:176516] Info : lttng_tracing = false [06:14:29:176517] Info : enable_frame_trace = false [06:14:29:176517] Info : traces_folder = /dev/shm/cpcd-traces [06:14:29:176518] Info : bus = UART [06:14:29:176519] Info : uart_baudrate = 460800 [06:14:29:176519] Info : uart_hardflow = true [06:14:29:176520] Info : uart_file = /dev/ttyUSB0 [06:14:29:176520] Info : fu_recovery_pins_enabled = false [06:14:29:176521] Info : fu_connect_to_bootloader = false [06:14:29:176522] Info : fu_enter_bootloader = false [06:14:29:176522] Info : restart_cpcd = false [06:14:29:176523] Info : application_version_validation = false [06:14:29:176523] Info : print_secondary_versions_and_exit = false [06:14:29:176524] Info : use_noop_keep_alive = false [06:14:29:176524] Info : reset_sequence = true [06:14:29:176525] Info : stats_interval = 0 [06:14:29:176526] Info : rlimit_nofile = 2000 [06:14:29:176526] Info : ENCRYPTION IS DISABLED [06:14:29:176527] Info : Starting daemon in normal mode [06:14:29:187818] Info : Connecting to Secondary... [06:14:29:266589] Info : RX capability is 256 bytes [06:14:29:266604] Info : Connected to Secondary [06:14:29:269992] Info : Secondary Protocol v4 [06:14:29:277073] Info : Secondary CPC v4.3.1 [06:14:29:280540] Info : Secondary bus bitrate is 460800 [06:14:29:288187] Info : Secondary APP v4.3.1-4f7f9e99-dirty-de58d93e [06:14:29:288343] Info : Daemon startup was successful. Waiting for client connections [06:14:29:782774] Info : New client connection using library v4.3.1.0 [06:14:29:786337] Info : Opened connection socket for ep#12 [06:14:29:786452] Info : Endpoint socket #12: Client connected. 1 connections [06:14:30:585896] Info : New client connection using library v4.3.1.0 [06:14:30:590523] Info : Endpoint socket #12: Client connected. 2 connections otbr-agent[302]: 00:00:00.114 [N] RoutingManager: BR ULA prefix: fd50:17d9:91ae::/48 (loaded) otbr-agent[302]: 00:00:00.114 [N] RoutingManager: Local on-link prefix: fd3f:9a2b:936b:8bac::/64 otbr-agent[302]: 00:00:00.151 [N] Mle-----------: Role disabled -> detached otbr-agent[302]: 00:00:00.154 [N] Platform------: [netif] Changing interface state to up. s6-rc: info: service otbr-agent successfully started s6-rc: info: service otbr-agent-rest-discovery: starting otbr-agent[302]: 00:00:00.415 [N] Mle-----------: Role detached -> leader otbr-agent[302]: 00:00:00.418 [N] Mle-----------: Partition ID 0x14db4d58 otbr-agent[302]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary! otbr-agent[302]: 00:00:00.522 [W] Platform------: [netif] Failed to process request#10: Unknown error -95 otbr-agent[302]: 00:00:00.522 [W] Platform------: [netif] Failed to process request#11: Unknown error -95 [06:14:32] INFO: Successfully sent discovery information to Home Assistant. s6-rc: info: service otbr-agent-rest-discovery successfully started s6-rc: info: service legacy-services: starting s6-rc: info: service legacy-services successfully started Listening on port 9999 for connection... Accepting connection. Accepted connection 7. otbr-agent[302]: 00:00:13.278 [W] Platform------: [netif] Failed to process request#12: Unknown error -17 otbr-agent[302]: 00:00:18.302 [W] Platform------: radio tx timeout otbr-agent[302]: 00:00:18.302 [W] Platform------: RCP failure detected otbr-agent[302]: 00:00:18.302 [W] Platform------: Trying to recover (1/100) otbr-agent[302]: 00:00:18.387 [N] Platform------: RCP recovery is done otbr-agent[302]: 00:00:18.387 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:171, chksum:cfc9, ecn:no, to:0xffff, sec:yes, error:Abort, prio:net otbr-agent[302]: 00:00:18.387 [N] MeshForwarder-: src:[fe80:0:0:0:ccb7:f9d2:d131:a5bb]:19788 otbr-agent[302]: 00:00:18.387 [N] MeshForwarder-: dst:[ff02:0:0:0:0:0:0:1]:19788 otbr-agent[302]: 00:00:22.412 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:148, chksum:3443, ecn:no, to:0xa400, sec:yes, error:NoAck, prio:low otbr-agent[302]: 00:00:22.413 [N] MeshForwarder-: src:[fd50:17d9:91ae:1:f483:8a80:7e95:9649]:52483 otbr-agent[302]: 00:00:22.413 [N] MeshForwarder-: dst:[fd50:17d9:91ae:1:7c74:9745:5e0a:9b9d]:5683 otbr-agent[302]: 00:03:22.375 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:148, chksum:4c90, ecn:no, to:0xa400, sec:yes, error:NoAck, prio:low otbr-agent[302]: 00:03:22.375 [N] MeshForwarder-: src:[fd50:17d9:91ae:1:f483:8a80:7e95:9649]:52483 otbr-agent[302]: 00:03:22.375 [N] MeshForwarder-: dst:[fd50:17d9:91ae:1:7c74:9745:5e0a:9b9d]:5683 otbr-agent[302]: 00:59:19.432 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:148, chksum:24d6, ecn:no, to:0xdc00, sec:yes, error:NoAck, prio:low otbr-agent[302]: 00:59:19.432 [N] MeshForwarder-: src:[fd50:17d9:91ae:1:f483:8a80:7e95:9649]:37930 otbr-agent[302]: 00:59:19.432 [N] MeshForwarder-: dst:[fd50:17d9:91ae:1:9dec:1f02:d4e7:f167]:5683

puddly commented 5 months ago

That's why i switch on multiprotocol, where everything was fine for a few weeks.

You should move your SkyConnect away from interference sources such as USB 3.0 ports, SSDs, 2.4GHz WiFi routers, and so on. The problem is still there, multiprotocol just can't tell you because multiprotocol cannot measure channel energy: it will always report nearly 0% utilization for every channel, no matter how congested it is.

wpformation commented 5 months ago

That's why i switch on multiprotocol, where everything was fine for a few weeks.

You should move your SkyConnect away from interference sources such as USB 3.0 ports, SSDs, 2.4GHz WiFi routers, and so on. The problem is still there, multiprotocol just can't tell you because multiprotocol cannot measure channel energy: it will always report nearly 0% utilization for every channel, no matter how congested it is.

I understand ! But I can assure you that I had moved my Skyconnect several times away from any interference and that my WiFi network (like neighbors's) does not interfere with the selected Zigbee channel. My HA Green box is alone in my entrance, my wifi is more than 10 meters away.

To be sure of this, I tested WiFi 2.4 with a mobile application, the different WiFi found were on channels 6 & 11 and my Zigbee on 20 (as read here: https://haade.fr/fr/blog/interference-zigbee-wifi-2-4ghz-a-savoir

image

In any case I will try to go again without Multiprotcol and I will let you know

(*) I should perhaps also mention that I use a Philips Hue bridge which runs on Zigbee channel 11 and which works perfectly throughout the house.

serrnovik commented 3 months ago

Looks like I'm having the same issue: https://github.com/home-assistant/core/issues/115217 Did you solve the problem?

issue-triage-workflows[bot] commented 4 weeks ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.