home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.08k stars 29.15k forks source link

Devices unavailable after Conbee to SkyConnect migration #86231

Closed prnzngr closed 11 months ago

prnzngr commented 1 year ago

The problem

I switched from Deconz/Conbee to ZHA/Skyconnect and since then the problems started. Everyday some random devices (Aqara sensors, Ikea Bulbs) getting unavailable. I have to pair them again and the next day some diffent device is unavailable. With Deconz/Conbee I never had this problems.

What version of Home Assistant Core has the issue?

core-2023.1.5

What was the last working version of Home Assistant Core?

core-2022.11

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zha

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha

Diagnostics information

zha-f63193a751e4f18abbdbeaba6257af97-LUMI lumi.sensor_magnet.aq2-38e364b71addd13e8f72c896cb83e544.json.txt zha-f63193a751e4f18abbdbeaba6257af97-Zigbee Coordinator-aabc7e236d33f421590e46eb61ea0400.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-01-19 12:35:59.016 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x99DE](lumi.sensor_magnet.aq2): last_seen is 141280.53930068016 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:03.510 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAD16](lumi.sensor_wleak.aq1): last_seen is 1038776.234536171 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:04.011 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0CBA](SmokeSensor-EF-3.0): last_seen is 167550.2345740795 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:28.082 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x6378](lumi.sensor_wleak.aq1): last_seen is 144684.77171301842 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:30.030 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x3BBB](TRADFRI remote control): last_seen is 144489.35309433937 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:30.057 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xDB20](TRADFRI control outlet): last_seen is 614320.0762073994 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:32.193 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAE69](lumi.sensor_magnet.aq2): last_seen is 169974.5131304264 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:47.953 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xC7D4](TRADFRI bulb E27 WS opal 1000lm): last_seen is 20564.596658945084 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:52.008 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x2A27](lumi.sensor_magnet.aq2): last_seen is 108069.15588212013 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:52.032 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x360D](lumi.sensor_magnet.aq2): last_seen is 170186.2264046669 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:04.510 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAD16](lumi.sensor_wleak.aq1): last_seen is 1038837.235011816 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:12.012 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0CBA](SmokeSensor-EF-3.0): last_seen is 167618.23617124557 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:12.019 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x99DE](lumi.sensor_magnet.aq2): last_seen is 141353.54190301895 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:32.083 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x6378](lumi.sensor_wleak.aq1): last_seen is 144748.77293539047 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:36.059 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xDB20](TRADFRI control outlet): last_seen is 614386.0775065422 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:45.031 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x3BBB](TRADFRI remote control): last_seen is 144564.3546833992 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:51.194 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAE69](lumi.sensor_magnet.aq2): last_seen is 170053.51456308365 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:05.511 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAD16](lumi.sensor_wleak.aq1): last_seen is 1038898.2359614372 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:11.954 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xC7D4](TRADFRI bulb E27 WS opal 1000lm): last_seen is 20648.59741950035 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:20.009 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x2A27](lumi.sensor_magnet.aq2): last_seen is 108157.15664672852 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:20.013 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0CBA](SmokeSensor-EF-3.0): last_seen is 167686.2369081974 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:25.020 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x99DE](lumi.sensor_magnet.aq2): last_seen is 141426.5432062149 seconds ago and ping attempts have been exhausted, marking the device unavailable

Additional information

No response

home-assistant[bot] commented 1 year ago

Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Change the title of the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

puddly commented 1 year ago

Please download diagnostics for the ZHA integration after letting it run for a few hours:

image
prnzngr commented 1 year ago

config_entry-zha-f63193a751e4f18abbdbeaba6257af97.json.txt

puddly commented 1 year ago

Aqara sensors becoming unavailable is not unusual: when joining a network, they pick the very first parent they detect, which rarely is a physically close one. If they picked good parent (at random) when joining your Conbee network but picked a bad one with the new network, that would be an issue. You can force them to pick a new parent by re-joining them to your network via a specific, physically-close routing device:

image

Is your SkyConnect in exactly the same position as the Conbee, plugged into the exact same USB extension cable? Is it away from USB 3.0 devices, SSDs, 2.4GHz routers, etc.?

snike3 commented 1 year ago

I can confirm this bug report.

I have 8 Aqara water sensors and 2 temperature sensors. I've been using ZHA with a Conbee II USB stick (latest firmware) for a couple years without issues. Using 2023.1.3 all my sensors were working. After upgrading to 2023.1.5 none of the Aqara sensors will not stay connected for longer than a few hours.

I have 76 total zigbee devices (Jasco, SmartThings, Centralite, Aqara) with a large number of them being routing devices. Looking at the ZHA device map, none of the Aqara devices are showing connections to any routing device or the coordinator. They're just kind of floating.

I went through and removed and re-paired all the Aqara devices using the "Add via this device" on the closest routing device (typically 5-15 ft total distance with no walls). Unfortunately, that didn't help. All the devices went to unavailable after the ZigBee battery timeout found in the ZHA configuration settings.

For more debugging I swapped in a Sonoff ZBDongle Plus E for the Conbee II, but I get the same result.

Tonight I'll try reverting to an older version of HA Core.

System Details: Raspberry Pi 4 4G Deconz Conbee II 26780700 / Sonoff ZBDongle Plus E 1.0.1 Home Assistant OS 9.4

Adminiuga commented 1 year ago

Joining aqara through any router won't work. You have to join it only through an aqara compatible router. There was a list somewhere on the forums

prnzngr commented 1 year ago

You can force them to pick a new parent by re-joining them to your network via a specific, physically-close routing device:

I already tried this method, it doesn't help

Is your SkyConnect in exactly the same position as the Conbee, plugged into the exact same USB extension cable? Is it away from USB 3.0 devices, SSDs, 2.4GHz routers, etc.?

Yes, everything the same and away of routers, SSD and so on

prnzngr commented 1 year ago

Joining aqara through any router won't work. You have to join it only through an aqara compatible router. There was a list somewhere on the forums

It is not only affecting aqara. Ikea Bulbs, Motion Sensors and so on are also affected

prnzngr commented 1 year ago

today I had again some unavailable devices, re-joined them with bulbs nearby (as you suggested) . In the visualization they still have no limb to a router like other devices image

Rogue136198 commented 1 year ago

Joining aqara through any router won't work. You have to join it only through an aqara compatible router. There was a list somewhere on the forums

It is not only affecting aqara. Ikea Bulbs, Motion Sensors and so on are also affected

I can confirm. I have been having nothing but issues with my hue motion sensors since switching to the skyconnect.

MattWestb commented 1 year ago

HUE motion sensors (i have 2 in production system) is real jumpers and is in the end using the worse parent they can but then they is being stable. I think you need having some router around them so they can doing there jumping thing or they is leaving the network.

You have one very long issue in Z2M with the problem https://github.com/Koenkk/zigbee2mqtt/issues/2693.

prnzngr commented 1 year ago

HUE motion sensors (i have 2 in production system) is real jumpers and is in the end using the worse parent they can but then they is being stable. I think you need having some router around them so they can doing there jumping thing or they is leaving the network.

You have one very long issue in Z2M with the problem Koenkk/zigbee2mqtt#2693.

as I have already written, there have been no problems at all with Deconz

kenwiens commented 1 year ago

I cannot confirm when my zigbee devices stopped working, but it was since Jan 1 2023. Every day I have a different set of Aqara motion sensors, zigbee switches and other zigbee devices that become unavailable. I tried some ikea plugs to use as routers but that didn't make any difference. Wish I seen this thread earlier as I assumed the problem was my house just couldn't handle zigbee and I have been busy returning my zigbee stuff and migrating to yolink.

The only devices that haven't gone off line at some point in the last month were those within 2m of the stick (with no walls). An Aqara motion sensor about 4m away (1 simple wall) goes off line intermittently. Beyond that, my devicers come "online" intermittently.

I am using a sonoff zigbee 3 stick. I did try moving it from usb3 to usb2, I tried a 1 m and a 2 m extension cable with multiple different locations for the stick. None of this had changed prior to the disconnect issue arising, but I was going through various trouble shooting scenarios so decided to try this.

The timing of my issues does seem to correlate with the revision history discussed above in this thread. These devices were relatively solid last year (Aug - Dec 2022)

I'm running HA 2022.1.7 on a Raspberry Pi

atmezferix commented 1 year ago

I'm also having multiple sensor dropping off as late. Usually the Samsung Centralite ones, I've been lucky with my Aqara ones by the sound of it. I'm unable to get one of the Samsung sensors working again as well despite re-pairing it, it also won't go through a re-configure. I couldn't tell you what version exactly it started but I've had Philips hue bulbs dropping off which has never happened in the last 2 years of using Home Assistant, so to me something is definitely wrong. Hope it gets fixed soon as before long I'm likely to fall down the stairs in the dark.

snike3 commented 1 year ago

Alright, I've been doing some testing and researching all weekend... Here's my findings.

TL;DR; The Sonoff ZBDongle Plus E does not appear to be compatible with Aqara devices, but deconz devices are (at least the Conbee II). Ensure Aqara devices are directly connected to compatible routers even if your coordinator is compatible. It seems as though something in the HA Core 2023.1.x update can cause devices to switch to a different routing device which may not be compatible with Aqara. Using a Conbee II with the deCONZ/Phoscon integration & add-on allows for stable Aqara devices.

Full History & Troubleshooting: I've been operating with a Conbee II using ZHA for over 2 years. I believe all my Aqara sensors (10 of them) were directly connected to the coordinator, and have had no issues with anything Zigbee (76 total devices - Centralite, Jasco/GE, Orbit, SmartThings, Innr, PEQ/Centralite, SmartThings/Centralite, Iris/Centralite) in that time. All my routing devices are Jasco/GE or Centralite branded. I was running HA Core 2023.1.3 for several days with my Conbee II stick without any issues.

Upgraded to HA Core 2023.1.5. All of a sudden multiple Aqara sensors went unavailable, but others stayed connected. I was in this state about a week, so I purchased the Sonoff ZBDongle Plus E after reading a few forums (should have read more).

Used the ZHA migration to switch to the Sonoff dongle and everything seemed good, but by the next morning ALL the Aqara devices were unavailable.

I've been trying to force specific routing devices to the Aqara sensors (like mentioned in an earlier post) while using the Sonoff dongle. Turns out that neither the Jasco/GE nor the Centralite routing devices I have are Aqara friendly. Connecting the sensors to those devices or the Sonoff dongle directly will not result in a stable connection.

I tried reverting back to HA Core 2022.12.9, but got the same results. Next I wanted to go back to the last 100% working configuration (HA Core 2022.12.9 using the Conbee II). Unfortunately, migrating from the Sonoff to Conbee II did not work. ZHA said it was successful and all my devices showed up, but nothing worked. It seemed as though I was going to have to completely rebuild my Zigbee network to get back to this state, so I thought a bit about that...

Decided I wanted to use the new Zigbee 3.0 stick instead of the old Conbee II, so I investigated using both sticks at the same time.

First, I completely rebuilt the Zigbee network from scratch using the Sonoff dongle and ZHA (this took a long time) to make sure my ZHA setup was up-to-date/clean. I left the Aqara sensors disconnected. Then, I setup the Conbee II using zigbee2mqtt (Z2M). Unfortunately when I added an Aqara sensor, Z2M reported that it was "unsupported" and gave no information. So I uninstalled Z2M and installed deCONZ/Phoscon. The Phoscon interface sucks to use on mobile device, but it's workable by rotating the phone to landscape occasionally. HA didn't auto-discover the service like it's supposed to, but it was easy enough to configure the Integration to point to the Add-on (the information is in the Add-on documentation).

Using deCONZ/Phoscon I was able to add all the Aqara sensors. It correctly identified them and even had a pretty image of the device. In HA I then refreshed the integration and all the sensors showed up.

My HA has been sitting this way for over a day now. I've had no unavailable devices on ZHA or deCONZ/Phoscon. Reviewing the history for the Aqara devices I can see that they're periodically updating (temperature graph shows changes).

So.... there's my solution to the problem... Not pretty or optimal, but functional. I now have 2 Zigbee networks. One on the Sonoff ZBDongle Plus E running ZHA and one on the Conbee II running Phoscon/deCONZ that I'm only putting Aqara devices.

After reviewing LOTS of forums, I believe that there are some Aqara devices that will work with non-Aqara Zigbee 3.0 routers/coordinators, but it appears that the temperature/weather and leak sensors are not in that list. It sounds like this is due to an older method of keep alive used by Aqara vs the Zigbee 3.0 protocol.

Essentially, if you have a deconz coordinator and are using ZHA then Aqara devices should work (and has for me for years), but something in the new 2023.1.x versions is causing it to no longer work (guessing route optimization and updated reporting configuration). Going to keep watching for updates on the issue, but given I've spent many hours troubleshooting/debugging this issue on my setup I'm out for now.

MattWestb commented 1 year ago

Rasp/CprnBee is one very dominant and normally is all end device having it as parent = good and bad.

EZSP is working little different and is normally no problems if the devices is working as Zigbee devices shall doing but Aqara is not doing it. And the problem is that EZSP is restarting its not knowing if some sleepers is being its children and they must pulling its parent OK after restart for being OK. The problem is if they is pulling its parent then its not online then Aqara devices is leaving the network and must being forced going back. And its strongly recommended having good routers in the network and connecting end devices to them for getting the mesh working well. With EZSP is possible blocking the coordinator having direct children = forcing then have router as parent. I have lumi weather and magnet of different version and only one have problem with battery braining and leaving but its outside on the balcony ins its -1°C and snow around it.

I was 3 week on holiday with the laptop and coming back and connecting it and in one hour is all device online also all 10 Aqara sensors then they have routers online and was working well without the coordinator being online for out of house time.

Is up to you building one star network or one mesh network but Zigbe shall being one mesh network for healing and working OK.

danTHAman152000 commented 1 year ago

I wanted to comment that my HA set up has been unstable since recent updates. Everything seems to work fine after a reboot, with the exception of my Zigbee aqara sensors. Sometimes none of them connect after the reboot, sometimes some of them do. But eventually all of HA locks up, all devices (aqara plus everything else) go offline, and I have to manually turn off the Pi and back. I am investigating on how to get logs, in case that can be of help. Is there any private info in these logs that I need to remove first?

hitokiri8x commented 1 year ago

I'm also in everyone situation I buyed a ZBDongle Plus E, but my aquara devices doesn't stay connected: temperatures are a bit stable, but botton and magnetic door switch not even a tiny bit..

Alright, I've been doing some testing and researching all weekend... Here's my findings.

TL;DR; The Sonoff ZBDongle Plus E does not appear to be compatible with Aqara devices, but deconz devices are (at least the Conbee II). Ensure Aqara devices are directly connected to compatible routers even if your coordinator is compatible. It seems as though something in the HA Core 2023.1.x update can cause devices to switch to a different routing device which may not be compatible with Aqara. Using a Conbee II with the deCONZ/Phoscon integration & add-on allows for stable Aqara devices.

Could you link even in private some resource where you find information? I have 25days before I can return the device ( thanks Amazon ) and I refuse to keep two network and use my old cc2531.. I want to keep digging IF is solvable or better to return and switch to something else ( not sure what is new and stable with aquara )

HarvsG commented 1 year ago

I switched from Conbee 2 to SkyConnect (Both ZHA). I am having the same issues. Chiefly with IKEA shortcut buttons and 2-button controllers. I also have some issues with aqara contact sensors (but they were quite unreliable before). On at least one of the shortcut buttons even after I re-join it to the network clicks don't produce events - almost as if it has become immediately unavailable (pending the time out). Last Seen doesn't update.

I have my SkyConnect connected to the same USB2 extension cord that I had used for the conbee stick. RPI3B+ on HAOS.

Edit: I have transitioned back to the ConBee II and the devices are working well now

maguiresf commented 1 year ago

Copying this comment from another issue ticket, seems to be a duplicate of this one but this seems like the most active so I'm going to try and following here:

"Same problem here, just migrated from Deconz / Conbee II to ZHA / SkyConnect. Hue motion sensors get "stuck" in some state, occupancy may be either true or false but they never change. Require a re-pair to get them working again. Aqara magnet sensors seem to either stop sending updates or just become unavailable. This can happen twice a day on some of these devices. Latest HA version, latest SkyConnect firmware."

Everything was stable on Deconz / Conbee II and had been for about 3 years.

prnzngr commented 1 year ago

I switched back to Deconz / Conbee and everything works fine. Seems like the Devs are not interested in this topic

Hedda commented 1 year ago

Before even starting to troubleshoot any problems with those kinds of symptoms I always highly recommend following this in-depth best practice guide regarding reception optimization and interference avoidance -> https://community.home-assistant.io/t/guide-for-zigbee-interference-avoidance-and-network-range-coverage-optimization/515752/

As well as in addition a switch/change to using a less noisy Zigbee channel (which is part of that guide) -> https://www.home-assistant.io/integrations/zha#defining-zigbee-channel-to-use

Then also follow these other related best practices to at least re-pair your devices again in their final location after trying to take on all the suggested actions -> https://www.home-assistant.io/integrations/zha#best-practices-to-avoid-pairingconnection-difficulties

Note that those are actions that you need to take regardless of which Zigbee Coordinator radio adapter and Zigbee gateway solution you use.

If still have issues then you will need to enable debug logging and replicate the issue so that you can provide debug logs that show the exact time when the issues occur.

Again, please understand and remember that all and any problems will be much easier to narrow down and troubleshoot if you have already taken actions to reduce any sources of interference and changed to a Zigbee channel with less noise.

prnzngr commented 1 year ago

For these kinds of symptoms I always highly recommend following this in-depth best practice guide regarding reception optimization and interference avoidance, regardless of which Zigbee Coordinator radio adapter and Zigbee gateway solution that you use -> https://community.home-assistant.io/t/guide-for-zigbee-interference-avoidance-and-network-range-coverage-optimization/515752/ (and then if still have issues then also follow these other related best practices to at least re-pair your devices again in their final location after tried to take on all the suggested actions -> https://www.home-assistant.io/integrations/zha#best-practices-to-avoid-pairingconnection-difficulties)

If you have read this thread carefully you will see it is not because of radio issues. with deconz no problems, with zha a chaos.

austwhite commented 1 year ago

I went from Deconz to ZHA and had similar issues at first. What I did was make sure all Aqara devices are paired close to the Co-ordinator so they don't try to pair through a router. I then put the Aqara devices in their place after they were paired and never had a problem again. I know that wasn't necessary with Deconz, so it may be a limitation with ZHA, but none of my Aqara devices have ever fallen iff the network after doing that. Might be a bandaide, but it worked for me :)

timiman commented 1 year ago

I'm also having issues with unavailable devices all over since the day I've switched from ConBeeII to SkyConnect -through ZHA both of them. It seems that something is very wrong with SkyConnect, because using ConBeeII did not give me this issue. The reason to change to SkyConnect was future Matter support and better support in general or of specific devices (like Aqara Plug). First the migration process went sideways having devices out of the zigbee network and then having to re-pair them again and again. I'll move back to ConBeeII. I hope the backup will work without having to re-pair 40 devices around the house. I hope SkyConnect will become some time in the future more stable than ConBeeII, so I'll switch to it again.

snike3 commented 1 year ago

It seems like most people are having issues with the new SkyConnect dongle. It also seems that a large number of those people are switching from using deconz and/or the Conbee II. While I do have the same issues, I didn't switch to the SkyConnect, but instead switched to a Sonoff ZBDongle-E (didn't know SkyConnect existed when I ordered).

This made me do a spec comparison of the dongles... The only real difference between the two (aside from the obvious physical differences) is the USB-to-Serial chipset. The SkyConnect uses the Silabs CP2102N (which is used in the ZBDongle-P), and the ZBDongle-E uses the CH9102F.

Both the SkyConnect and ZBDongle-E use the Silabs EFR32MG21 transceiver with the EmberZNet (ezsp) stack. This seems to indicate that the issues stem from either the EFR32MG21/ezsp firmware, the ezsp driver used by ZHA, or ZHA's of the ezsp driver. Tough to say which without digging deeper. However, given that I was using ZHA with the deconz driver and Conbee II for a few years without a problem I'd lean toward the ezsp driver and/or EFR32MG21. That being said something did change in ZHA in the January release that triggered me to start having issues in my zigbee network causing me to look at new dongles to resolve the issue.

timiman commented 1 year ago

Quick note as @Hedda mentioned before, is that another thing that has silently changed is the radio channel of the ZigBee network. Deconz was using Channel 25 while SkyConnect uses Channel 15 (by default at least). So in case of a Wifi mesh system exists around the house, check if the Wifi channel been used is in lower ones, like 1 to 6. The following link will give you a quick map of overlapping channel between Wifi and ZigBee. https://support.metageek.com/hc/en-us/article_attachments/115017048148/ZigBee_Channels.pdf In my case the Wifi was using 6-11 and by changing it to 7-12 I saw more -placebo?- stability of the ZigBee network -no dropped devices up to now.

austwhite commented 1 year ago

@timiman Deconz usually uses channel 15 by default, if it was on 25 then it would have been manually changed to 25. I was also a user of Deconz. The channel being used doesn't really matter, unless you have interference on that channel or run more than one Zigbee network. WiFi on 6 and 11 should not interfere with Zigbee Channel 15 in my previous experience using my WiFi routers on 1, 6 and 11 and having my Conbee II stick on channel 15 in both DeConz and then in ZHA. It just worked. With the channel, did you try the setup, as per the documentation, to change the channel manually to something other than 15 if you need it? ZHA does default to 15 and this is not a bug https://www.home-assistant.io/integrations/zha

zha:
  zigpy_config:
    network:
      channel: 15             # What channel the radio should try to use.
      channels: [15, 20, 25]  # Channel mask

Unfortunately my issues with SkyConnect were not solved by the pairing method I suggested earlier. They all came back, and I have no interference on channel 15. These issues appear to be deeper routed than the Zigbee channel or whatever. It has made it to the point where my 59 Zigbee products are unreliable and I will be taking time to move back to the Conbee II stick this weekend. I really wanted the SkyConnect to work out and I wanted to support the Nabu Casa team, but this product just seems to have too many bugs in its current form to be usable in production. The problem is, it is hard to get anything in the logs to show an actual error. According to the logs the signal is sent out....... I do think the title of this bug report is not fair though. It is not ZHA as a whole that is making "big problems" It is more ZHA with the SkyConnect, or maybe the particular SiLabs chip.

Has anyone in this thread used the SkyConnect in Multi-PAN mode? Does that make the problems less or worse or the same?

puddly commented 1 year ago

I strongly suggest anyone affected by "devices randomly stop working/go offline" issues please take a look at https://skyconnect.home-assistant.io/connectivity/, especially the "How to counter interference" section. In exceptionally noisy environments, the threshold between "just barely working" and "not working" is low and may have been exacerbated by switching coordinators. Try a different coordinator placement, orientation, a different USB extension cable, a second USB extension cable, etc. RF issues aren't intuitive and won't be revealed with just a WiFi scan.

another thing that has silently changed is the radio channel of the ZigBee network

If you used the ZHA migration flow, the network channel did not change. However, if you set up a new network from scratch, it will be formed on channel 15.

The only known "issue" with Conbee migration is that a relatively-recent firmware version is required, as otherwise the Conbee doesn't provide a way to read the network key frame counter with older firmwares. If that counter isn't migrated, some brands of devices will refuse to receive commands from the coordinator and will only be able to send updates.

austwhite commented 1 year ago

@puddly A lot of people did not have issues with Conbee sticks, which are very susceptible to interference. For my issue, I can let you know wireless interference unfortunately is not the cause. The stick is on the provided USB extension as far away as the extension will let it currently, but I have also tried with extension cables up to 1 metre long. Appreciate the input and maybe some do have issues due to not reading that article, but at least for me, that is not the primary issue.

timiman commented 1 year ago

Sadly just some minutes ago, devices started becoming unavailable again. So, positioning the wifi channel a little bit away from zigbee channel did not help in the end. It seems like that there is something wrong with the routing of the zigbee network in general, meaning from and to the coordinator. The core log is full of timeouts entries

maguiresf commented 1 year ago

Yeah, I really don't think this is just a normal interference issue either. My Zigbee network was just broken with the SkyConnect so I purchased a Sonoff Zigbee 3.0 Dongle-P and did a radio migration to that. Absolutely nothing else changed, same channel, same position of dongle and Zigbee devices, just swapped out the sticks and did a radio migration and all the issues I was having have stopped. The Hue motion sensors have been happily on the network and working reliably for about 36 hours now, all the Aqara door sensors stay on the network and work every time (rather than like 50/50 before). Don't know if the issue is the the SkyConnect itself or with the bellows library but something isn't right.

Edit: My network is about 100 devices, mostly Hue lights, couple of Ikea lights some Innr plugs, plus Hue and Aqara end devices. The network is around 50% routers, 50% end devices. My apartment is only around 140m2 so there's pretty dense coverage.

austwhite commented 1 year ago

I am beginning to think it is the library and not the hardware. I ran the SkyConnect on another hub and it was flawless for 48 hours. Even close to the USB 3.0 connector. Not a single time out. I might try the multi-pan firmware with zha and see if that is better, but I will wait until I finally find a CM4 as I am not risking my production system again

rchiileea commented 1 year ago

so unlike you guys I have been experiancing problems since 2023.2 first it all started happening in zigbee2mqtt devices dropping off and going offline that only a reboot would fix, I even tried moving the whole system to another machine as a test still the same issue, then tried via zha still same issue only a reboot would fix.. Then I thought to hell with it, let me try with the conbee back in, zigbee2mqtt and zha both machines still the same issue..

going to try to roll back now.

TheFelix93 commented 1 year ago

Hmm I had the same issue. After reverting back to 2023.1.7. My aqara temp, humidity sensor keeps connected at least since yesterday. While the sensor disconnected after some hours all the time, before that change. I think it did start with the update 2023.2.5, but wanted to be sure.

Could you also try this out? Maybe this Info helps for debugging?

puddly commented 1 year ago

I am beginning to think it is the library and not the hardware. I ran the SkyConnect on another hub and it was flawless for 48 hours. Even close to the USB 3.0 connector. Not a single time out.

I'm afraid there isn't much that I can do to help out unless you can provide some actionable info: what other hub are you referring to? Running what software? Can you provide debug logs and what Zigbee network channel it's using?

depasseg commented 1 year ago

I stumbled across this while searching for reasons why all of my Aqara sensors are going offline. I am on HA Yellow, which I believe uses the same guts as Skyconnect. I set up a new zigbee network and wiped all my devices when joining to HA. And I'm still experiencing the same issues describes above. I don't think this is just an issue for people who migrated an existing zigbee network.

pschneider87 commented 1 year ago

Having the same issues since following the guide to migrate from zigbee2mqtt with conbee2 to ZHA with skyconnect. Some (not all) Hue Motion Sensors get unavailable after one day or so.

I'm not sure if it helps, but I found this remark in the notes of zigbee2mqtt about the Hue Motion sensors: https://www.zigbee2mqtt.io/devices/9290012607.html#notes

This specific device has been reported to have issues repairing to a Zigbee network after upgrading from a CC2531 to a CC2652 controller (Zigbee 1.2 to 3.0). (Re)pairing may only work after pairing the device to another network and channel first (has been tested with a Philips Hue 2.0 hub in this instance) before pairing it back to the Zigbee2MQTT network again.

I used my old Conbee2, changed the channel, and paired one of my hue motion sensors via my PC. Then I re-paired it with Skyconnect and ZHA, let's see if this helped.

TheJulianJES commented 1 year ago

What Hue motion sensor model is that? SML001, SML002 or one of the newer SML003, SML004 models?

pschneider87 commented 1 year ago

In my case, all Motion sensors are SML001 devices

maguiresf commented 1 year ago

I have 4 of the SML001 that were problematic with SkyConnect. I also have 1 SML002 and that was fine, didn't disconnect once the 10 days or so that I was using the SkyConnect.

rchiileea commented 1 year ago

I personally don’t think it’s a sky connect as I have gone through quite a few swaps and changes over the last few days, zigbee2mqtt has been dropping devices two on both the combee 2 and the sky connect and so has zha.. trust me I have tried on two different machines, I have gone back to the 2023 .1 build and it’s all been stable for longer than it was on the latest core release. I am leaving it for now it’s working fine.

d-0l commented 1 year ago

I have this exact same issue with my Lidl motion sensors. I migrated from a Sonoff 3.0 zigbee stick to a Skyconnect. I re-paired my devices from the off, as well as changing entity names and remaking my automations as we don't have many and I wanted to eliminate any problems.

Everything worked last night, this morning everything motion sensor related was broken. The light blinks red when motion is detected as normal (sometimes green, which is new) but ZHA sees nothing. Interestingly, it can report the LQI but not motion. It just reports unavailable and clear for all 3 of the motion sensors.

Everything is in the same place as it was in my original setup, including the Skyconnect. I even brought the motion sensors off the walls to eliminate any issues. No difference.

Edit: Tried to do a fresh install of ZHA, it picked up all but one device without pairing. Can't do anything with any of them. They're ALL unavailable now, including mains powered extension leads. Cannot delete and manually pair. Nothing gets discovered that way.

austwhite commented 1 year ago

@puddly The SkyConnect is just a ZigBee coordinator out of the box. It will work with other platforms such as OpenHAB and other hubs. What it works on is not as relevant as the fact it has significant issues with ZHA . Issues that Conbee and Texas Instruments based coordinators do not have . This is reflected by other comments in this issue. Tell me what you need/want as far as logs and I will try to get them, though mostly my logs don't actually show errors. Devices just randomly drop off, particularly battery devices that go to sleep .
As additional note, I didn't migrate my network as the migration from the Conbee failed, so I set all my devices up new, pairing each one from scratch I have found occasional slow response from mains devices like bulbs and switches, but they seem to stay connected for me. Battery devices, Aqara, Tuya ZigBee and Hue sensors mostly, will drop off, almost like they go to sleep and then ZHA doesn't detect them right when they wake up periodically. I have tried increasing the delay before a device is reported as unavailable, but all this did was stop the device actually showing unavailable. The devices still stopped responding.

lougreenwood commented 1 year ago

I'm also seeing the same behaviour, however I can't attribute it to a recent update since I only just migrated from my Hue hub to ZHA with a Sonoff Dongle E.

However, what I am seeing is that specific Hue motion sensors are becoming un-available periodically and I either need to re-pair or reset them (so far all of the unreliable ones are SML001 - but not all SML001 are unreliable. I also have some SML003, but only just added them to the set them up on the new network, so I can't comment on the reliability of these (although they were part of the old Hue hub setup)).

What's interesting is that the motion sensors didn't move positions and the controller barely moved positions. I've had my hue system for at least 5 years and I've never known about or knowingly had issues with Zigbee interference. In that time I've moved house and moved the hue controller around... stuffed it in corners, under furniture etc - no issues.

Before switching to ZHA, my Hue hub was under a sofa, sitting on top of my Pi4 HA server next to a bunch of PSU's, a zwave Ring alarm hub, network switches, firewall & modem devices in a corner made of brick walls - not an ideal position - but also the Hue hub based network for a year.

However when I started having issues with the motion sensors and Sonoff (as a side note - I also have maybe 30 hue bulbs, 4 hue strip lights (old & new versions) and play bars - all are solid on the ZHA/Sonoff network for about a week), I started reading about Zigbee interference etc and moved the Sonoff to a better position and put it on a 2m USB 2 cord.

At the moment I'm having to reset some specific motion sensors every day. It seems that the network gets unreliable and these specific sensors flake out when I start adding new devices or need to re-pair other flakey ones - but that might just be coincidence since I'm actively re-pairing and still in progress adding other sensors back onto my new network.

I'm considering buying a Sonoff P to test if it's the hardware/chipset. But so far, as a first impression it's not looking good for the Sonoff E/ZHA pairing with Hue motion sensors for me 😢.

(Also, with that said - this is open source - thanks to all the devs for your hard work, I know this type of project is not an easy one to handle).

d-0l commented 1 year ago

As someone having these issues with the Sonoff P, I don't think that will fix things for you to buy one. I was going to upgrade to an E as I had read rumours it may receive thread and matter firmware later, but went with the Skyconnect instead.

Since I came from zigbee2mqtt & the Sonoff, I'll try going back to the Sonoff and ZHA to see if ZHA or the Skyconnect is the problem.

pschneider87 commented 1 year ago

I used my old Conbee2, changed the channel, and paired one of my hue motion sensors via my PC. Then I re-paired it with Skyconnect and ZHA, let's see if this helped.

So after changing one Hue sensor which went unavailable daily, I have no issues after my procedure so far (while others got unavailable again). But could also be by luck with no coincidence, I will observe further.

Maybe this is also a trace the Devs could look into?

TheFelix93 commented 1 year ago

Another update from me.

I reverted my ha core to 2023.1.7. and till now 4 days without any issue anymore.

This must mean something.

{ "home_assistant": { "installation_type": "Home Assistant OS", "version": "2023.1.7", "dev": false, "hassio": true, "virtualenv": false, "python_version": "3.10.7", "docker": true, "arch": "x86_64", "timezone": "Europe/Berlin", "os_name": "Linux", "os_version": "5.15.90", "supervisor": "2023.01.1", "host_os": "Home Assistant OS 9.5", "docker_version": "20.10.22", "chassis": "vm", "run_as_root": true }, "custom_components": { "localtuya": { "version": "5.0.0", "requirements": [] }, "hacs": { "version": "1.30.1", "requirements": [ "aiogithubapi>=22.10.1" ] } }, "integration_manifest": { "domain": "zha", "name": "Zigbee Home Automation", "config_flow": true, "documentation": "https://www.home-assistant.io/integrations/zha", "requirements": [ "bellows==0.34.6", "pyserial==3.5", "pyserial-asyncio==0.6", "zha-quirks==0.0.90", "zigpy-deconz==0.19.2", "zigpy==0.53.0", "zigpy-xbee==0.16.2", "zigpy-zigate==0.10.3", "zigpy-znp==0.9.2" ], "usb": [ { "vid": "10C4", "pid": "EA60", "description": "2652", "known_devices": [ "slae.sh cc2652rb stick" ] }, { "vid": "1A86", "pid": "55D4", "description": "sonoffplus", "known_devices": [ "sonoff zigbee dongle plus v2" ] }, { "vid": "10C4", "pid": "EA60", "description": "sonoffplus", "known_devices": [ "sonoff zigbee dongle plus" ] },

puddly commented 1 year ago

@austwhite

What it works on is not as relevant as the fact it has significant issues with ZHA . Issues that Conbee and Texas Instruments based coordinators do not have .

It is very relevant information: I maintain the libraries that communicate with all three of those coordinator types, so a few hours of verbose debug logs from both ZHA and the other home automation system are critical for me to be able to figure out what the difference is. A SkyConnect has been powering my home network since August and I do not experience these issues, which is why I cannot do anything to help out without having access to a few hours of debug logs from a person who does.

Feel free to email them to me if you don't want to attach them to the GitHub issue.


@TheFelix93

I reverted my ha core to 2023.1.7. and till now 4 days without any issue anymore. This must mean something.

Can you also enable ZHA debug logging on both 2023.1.7 and the latest release, to show a sensor disconnecting with the new release and not the old release?

j-a-n commented 1 year ago

Maybe related #88810