home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.56k stars 29.91k forks source link

ZHA devices losing connection #86911

Closed Rogue136198 closed 7 months ago

Rogue136198 commented 1 year ago

The problem

ZHA routinely loses connection to devices such as Philips Hue motion sensors or Sonoff contact sensors.

What version of Home Assistant Core has the issue?

2023.1.7

What was the last working version of Home Assistant Core?

2022.12.x (or so)

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

config_entry-zha-9cd1a54288881458e356be45fc32d165.json (2).txt

Example YAML snippet

N/A

Anything in the logs that might be useful for us?

I do not see anything signifigant other than errors showing the device changing to unavailable.

Additional information

Since I received and installed my SkyConnect I have been having issues of various devices losing their connection to my Zigbee mesh. The worst offender has been my Philips Hue motion sensors (both the indoor and outdoor models) but I have also seen the issue with Sonoff contact sensors. Once the devices disconnect from the mesh the only way to re-integrate them is to press the factory reset button and set it up as a new device. I have done this numerous times and the issue persists.

While this issue did appear around the time I moved from my Sonoff Zigbee dongle to my new SkyConnect I had also upgraded to HA 2023.1 at the same time. When I changed dongles I initially attempted to migrate directly and keep my configuration but I ended up starting over from scratch and re-added all devices one by one.

I have been trying to troubleshoot this issue myself over the last few weeks but no matter how many times or which way (via device or direct from zigbee coordinator) I re-add these devices they always seem to lose their connection. Prior to this month these devices were rock solid and never had a single issue. I have performed most of the sanity check troubleshooting steps such as rebooting HA OS, pulling batteries for 15+ mins, ensuring my zigbee dongle is away from USB3 ports and WiFi APs, etc.

I am open to any and all troubleshooting steps and/or advice.

home-assistant[bot] commented 1 year ago

Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Change the title of the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

jppw commented 1 year ago

Same here with flashed Lidl Zigbee Gateway and Zigbee devices from different vendors

When i set the devices in connect mode the integration readopt them as the unavable devices

MattWestb commented 1 year ago

@jppw "When i set the devices in connect mode" = resetting (factory reset) the device so it can joining the network ?

jppw commented 1 year ago

I think so.

Hold the Button for 5-10 sec until IT blinks and the the Integration find it and "readopt" IT unser the old Name, settings and links to Automation.

Rogue136198 commented 1 year ago

So I have had one (just one) Hue motion sensor stable for about a week now. No idea why it's now behaving correctly. I'm going to reset one or two other motion sensors and see if I can get them to also remain stable.

jppw commented 1 year ago

At my side there a two Lidl smart home pir sensors which always runs, all the other stuff isnt avaible.

I reset them, they work, but next day they are unavaible.

EarMaster commented 1 year ago

I suspect that the "time until device gets marked unavailable" settings in the global configuration are to blame for these falsely unavailable devices. I've set mine to 30 days on wednesday and had no disconnects since.

puddly commented 1 year ago

The timeouts are quite generous. What specific devices are you referring to?

EarMaster commented 1 year ago

I have two Hue Motion Sensors (SML001) which regularly lost connection since I moved them to ZHA (no problems when I had them connected throught the Hue bride). One is triggered more often than the other and the one with more action was losing connection less often which led me to the conclusion that there might be some timeout involved. Therefore I increased the unavailable timeout and since then they both kept their connection.

puddly commented 1 year ago

The timeout is purely for offline notification: it had no impact on device connectivity. If a device doesn't check in in six hours, it'll be considered offline. Most do so every 15 minutes, if not more often.

Adminiuga commented 1 year ago

I have two Hue Motion Sensors (SML001) which regularly lost connection since I moved them to ZHA (no problems when I had them connected throught the Hue bride). One is triggered more often than the other and the one with more action was losing connection less often which led me to the conclusion that there might be some timeout involved. Therefore I increased the unavailable timeout and since then they both kept their connection.

But here's the question: let's say device is unavailable and you trigger the motion. Do you see traffic in the logs? And what coordinator are you using?

EarMaster commented 1 year ago

I have a SkyConnect. When the sensors were offline (they aren't right now so I can't test this) there were no status updates in the log for the individual device. The device was frozen in the last known state and I had to reset it and pair it again to make it work again.

jppw commented 1 year ago

zha-ad24a01d74aa7c6bf9c38fa971bd21ad-Zigbee Coordinator-ea25713917845a95785721e9fb1f35fb.json.txt PIR sensor is offline, i can trigger it, it flashes red, but home assistant have no clue about it.

3x temp sensors 1x pir sensor (pir01) 1x rbd outdoor light

image

image

jppw commented 1 year ago

Tell me what you need, i will deliver it :-)

harbri commented 1 year ago

I have a SkyConnect. When the sensors were offline (they aren't right now so I can't test this) there were no status updates in the log for the individual device. The device was frozen in the last known state and I had to reset it and pair it again to make it work again.

I'm having the exact same issue with the sky connect stick.

Hankanman commented 1 year ago

Same issue with Philips hue motion sensors, on the home assistant yellow

ee02217 commented 1 year ago

Same issue with my 2 SML001 After moving to ZHA with skyconnect.

lougreenwood commented 1 year ago

I'm also having this issue, so far I've tried changing channels twice, the first time to channel 20 (this is what I was previously using on my Hue hub), then to 25 after doing some wifi channel inspection with wifi explorer app.

I'm noticing that my SML003 are working fine, then today I read that Hue motion sensors don't support rechargeable batteries (I used them for years so far with rechargeable batteries, but 🤷‍♂️). I realised that my SML003 were using the stock batteries (since they were newer), and almost all my SML001 were using rechargeable.

So I did a test and put rechargeable in my SML003 and normal batteries (from an SML003) in a particularly bad SML001.

That caused the SML003 to drop off within a few mins (even though it's about 1m from the coordinator with line of sight) and the SML001 managed to stay up for a few hours. I did reset the SML003 after it dropped off and it's been fine.

One thing I noticed is that the SML001 - when it eventually did drop off - still shows as available and is frozen in an motion detected state. Also the visualiser shows it has 2 connections. I'm hoping that it dropped off because the mesh was rebuilt this morning on channel 25 and the mesh was still settling... I don't have high hopes, but given my experience so far with ZHA & Sonoff E dongle I'm willing to grasp onto the faintest glimmer of hope! 😢

Tonight I'm going to go buy a bunch of batteries and swap out all the rechargeable ones and see what happens. I'm hoping that the SML001 is particularly sensitive to under voltage (my rechargeable are 1.2 v and stock ones are 1.5), and this causes them to do something weird which ZHA can't handle and the drop off. However, I never once had this issue with the Hue hub for multiple years, so maybe Hue hub can better heal and recover from these situations.

(FWIW, I have ~14 SML001, 5 SML003, 38 mains-powered hue lights, 6 hue smart switches and 5 hue smart buttons in a ~300m2 2 story house - so I don't think the issue is a lack of coverage.)

But let's see if changing batteries solves it... 🤞

MattWestb commented 1 year ago

@lougreenwood In one Zigbee mesh one end device can only have one router as parent. But as you is having Philips HUE routers that is having some undocumented futures (bug) that is doing they is not deleting its children in the neighbor table and is reporting they is having children that is that have aging out or changed its parent to one other router.

That is the reason i have putting all HUE routers in the black box for bad Zigbee devices.

lougreenwood commented 1 year ago

Thanks @MattWestb, any thoughts on my rechargeable battery under-voltage hypothesis?

(BTW, I have logging enabled for this whole situation since about 2 days ago, it'll be a firehose of data now, but it's available if it's useful to anyone.)

TheJulianJES commented 1 year ago

Philips HUE routers that is having some bug that is doing they is not deleting its children in the neighbor table

Are you sure that issue is still present? The Silicon Labs based Hues should be perfect probably (updated EmberZNet multiple times). The older ATmel and TI based Hue bulbs might have had this issue, but I'd re-test it at this point probably. Haven't noticed any issue with my mostly Hue based mesh. (Although I've had some old Hue Bloom lights dropping out -- they're TI based like Gen1 and Gen2 lights: CC253x IIRC. Gen3 and 4 is Atmel. Newer/Bluetooth is Silabs)

MattWestb commented 1 year ago

@lougreenwood NiMH is standard rated at 1.2 V and fresh alkaline zinc battery 1.6 V. One example not working well is most tuya TRVs dont working with NiMH but is OK with alkaline batteries.

MattWestb commented 1 year ago

@TheJulianJES here is the device info i must looking if its gets some updates for it then lat time it was in production system was in de(F)CON.

Device info
LWB010
by Signify Netherlands B.V.
Connected via Billy RCP 4.2.1 RK3318
Firmware: 0x01002100

1.88.1 

manufacturer_id=4107, image_type=268, current_file_version=16785664

Ans it was always having all end devices left until it was factory retested.

TheJulianJES commented 1 year ago

Ah, pretty sure those are the ATmel based bulbs. Check if this update works for you: https://github.com/Koenkk/zigbee-OTA/blob/2c6e06cd5e3d45fb857daa0ec84484468d0cec40/index.json#L151-L159 (otherwise do the one below first)

Would be interesting to know if that helps. If it's ATmel based, it'll report zcl_version = 1 (TI based ones will also, but yours isn't one I'm pretty sure). If it's Silabs based, it'll report zcl_version = 2 with older firmware versions and zcl_version = 8 with the latest 1.101.xx`` releases.

I'm always trying to push the latest updates from Hue to that repo.

MattWestb commented 1 year ago

I was finding the image id in z2m OTA and downloaded it and reloaded ZHA and 20 minutes later is reporting Firmware: 0x01002500 / 1.101.2 zcl_version = 1 = Zigbee PRO but not Zigbee 3. IEEE: 00:17:88....

PS: Thanks for the firmware !!!

InFlames82 commented 1 year ago

Same issue with Philips hue motion sensors, on the home assistant yellow

asfalots commented 1 year ago

I have the same issue with aqara switch, and sometimes even with Innr GU10 spots. Never had those issues with the conbee2 coordinator. I checked (and changed) the channel used, and repaired the whole network multiple time, I'm still losing devices from time to time (can work for 2 or 3 days then it drop) The weird thing is it can happen even with highly used devices (like my kitchen switch and lights).

MattWestb commented 1 year ago

@TheJulianJES do you knowing what chip Innr is using also for older devices ? I think some was saying it was the same as OSRAM is / was using.

@asfalots can you posting the 6 first or all numbers of the device IEEE from the Innr GU10 device card so i can looking for the manufacture of the chip ? Its looks like this: IEEE: cc:cc:cc:ff:fe:c1:3a:8c. If the Innr is behaving like OSRAM plugs is doing (corrupting and loosing packages) you is having problem with many devices that cant working OK in the network.

Only one info: old IKEA lights firmware (Silabs chips) was having one nasty bug then was stop routing packages OK after getting some parent accouterments and was making large mess in the network but its fixed in later firmware updated.

asfalots commented 1 year ago

@MattWestb these is few that I have:

0xc49886000006a8ad
0xc498860000063925
0xc498860000064369

Now that I think of it. it's happen mainly when those devices belong and act in a group. (except for end device like a switch)

MattWestb commented 1 year ago

I was locking up c4:98:86 and its belongs to manufacture Qorvo International Pte. Ltd. that is little unknown in Zigbee devices but they is doing 801.15.4 chips and LEEDARSON was doing some projects with them in 2018. Looking on CSA-IOT and they is using Ubisys Zigbee stack c7bv v2.3 that is little and looks mostly working with old CC2538 and some ARM used id mobile devices.

Can you putting one more "normal" device there you is having the Innr GU10 for testing if its doing routing problems for other devices that is going thru it / having it as parent ?

Group commands is not using normal routing its being broadcasted but bad routers can making strange things with that 2. Normally is broadcastt working if light router problems in the network but it can also being badly broken of bad devices.

The Aqara switch is likely only sending unicast commands and all command must being OK routed to the coordinator and then the command from the coordinator must being routed to your device that shall reacting on the command so routing is very importing in the network.

TheJulianJES commented 1 year ago

do you knowing what chip Innr is using also for older devices ? I think some was saying it was the same as OSRAM is / was using.

Not sure. Just took apart one old OSRAM plug but the actual Zigbee chip seems to be completely covered and I didn't wanna destroy that plug right now (but I guess that wouldn't be a huge loss). I think it might be a CC2530?

MattWestb commented 1 year ago

@TheJulianJES You can looking on the MAC / IEEE and you dont need distorting it completely if not like doing that ;-))

TheJulianJES commented 1 year ago

Starts with 7c:b0:3e. I guess "OSRAM" isn't really helpful

LunchboxNYC commented 1 year ago

I am having this problem as well with multiple hue motion sensors on home assistant yellow, they are all tuning duracel lithium batteries, so not sure the rechargeable has anything to do with it. They were stable until about 2 weeks ago or so. Is they drop off my mesh network and become unreachable requiring a reset to repair every couple days or so.

austwhite commented 1 year ago

I've been trying to get logs, but they really haven't shown anything usable. I have to change some logging settings. This issue is affecting multiple battery operated devices. Hue RWL-021 and RWL-022 switches, Hue motion sensors, both the 01 and 03 indoor versions, Aqara temperature and humidity sensors and Tuya Zigbee motion sensors. Only battery devices are affected for me. All mains connected devices stay connected fine. It seems to primarily happen with sensors that go to sleep after a period of inactivity. That said, the Hue Dimmers seem to also play up after a few button presses. As an example, if I repeatedly press the dim down or dim up button, after about 4 or 5 presses it stops responding for a time, maybe 30 seconds or so, and then starts responding again for another 4 or 5 presses. If someone can tell me what log settings to set for the logging, I'll set it and run some tests to capture the issues. For now, as my system is production and I don't have a test system, I've moved all the hue lights and motion sensors to the Hue Bridge, as the lights are the most important thing to keep working, but I have left the other sensors on ZHA and these sensors do still drop off and become non-responsive

filipkotian commented 1 year ago

Exatcly the same for me. Hue motion sensors stop working, became unreachable, need to reset.

lougreenwood commented 1 year ago

FWIW, after 2 weeks with my experiment on Zigbee2MQTT, I do occasionally see in HA that devices go unavailable, but it seems that Z2M is somehow able to recover.

Aside from the lights / sensor in one room sometimes being slow to respond, everything has been mostly stable and my 14 SML001 are behaving as I expect. I sometimes see the red light when some detect motion.

(edit: oops, was replying to the wrong thread! But maybe still a useful datapoint.)

austwhite commented 1 year ago

@lougreenwood I think all these issue reports are likely related to the same thing as they all seem to affect battery devices, a lot Hue, but also some others. :)

harbri commented 1 year ago

Does anyone knows if/when this issue will be solved?

MattWestb commented 1 year ago

As older devices is with hard and software bugs that is not fixed in them we cant getting then working 100% without reconfigure the coordinators so they is not Zigbee 3 compliant and braking all other devices we is having i think its not mush we can do :-((

I have not testing it but i think somthing more modern is better like this https://www.evehome.com/en/eve-motion or https://www.evehome.com/en/eve-room but they is expensive but is not having the old bad hard and software and is getting firmware updates to fixing them.

austwhite commented 1 year ago

If you have a Conbee or Conbee II stick, you may find they hold connection better as they tend to pull these bad devices directly to the stick rather than through routers,. Just my experience and your mileage may vary. I haven't had any disconnects since I found my own mistake, which was using Multiprotocol support. Well, more correctly, Multiprotocol support with the ZigBee on 20 and Thread on 15. I've not had any issue since moving ZigBee back to 15. I hope those having genuine issues can find solutions. Maybe Zigbee2MQTT is an option as it seems a little better at self recovery

MattWestb commented 1 year ago

Multiprotocol support with the ZigBee on 20 and Thread on 15

If you is forcing Zigbeed and OTBR using different channel and the system is try using it then you shall have large problems getting any devices working bot Ziogbee and thread.

If you reading the Z2M thread of Philips HUE motion detector problem then you is not event trying testing it then its more or less the same.

deCONZ coordinator is very dominant and is like have all device direct connected to it but then you is getting one not good working mesh then loosing the redundancy and its not doing OK timing out of direct children then the firmware is patched for working with old devices but is not implanted all Zigbee 3 / PRO things that modern devices is using like end device timeout and pull control.

Edit: Can you trying taking battery out of one of your sensor and looking how long time the system is flagging it off line then it connected to the CornBee ?

peterjuras commented 1 year ago

Hi,

Is there any solution or workaround to this? Would migrating to ZigBee2MQTT solve this issue?

I have several hue Dimmer remotes dropping of randomly, even though they are directly bound to lights. They appear as if the coordinator would have removed them from the network.

I have 10 of them, and don't want to connect them every week again. This is very frustrating.

puddly commented 1 year ago

@peterjuras See https://github.com/home-assistant/core/issues/89311#issuecomment-1565740967. The Hue dimmer remotes seem to suffer from the same problem.

peterjuras commented 1 year ago

Thanks @puddly , I'll try the config although I understand that it's likely a large security issue.

For what it's worth, this was in the logs for one of my remotes that dropped out today. It seems that it sent a single event, and then immediately became unavailable. (It might also just be a ZHA event for "this is unavailable now" but I'm not sure).

image

shanelord01 commented 1 year ago

I'm having this problem with Tuya light switches and some battery based devices. Home Assistant Yellow.

Getting well beyond a joke with having to re-pair at least one or two (random) devices every day to make them work again.

I'm about to try moving to Z2M instead of ZHA - the developers seem far more responsive to issues.

hwinkel commented 1 year ago

I'm have done the opposite, coming from Z2M and moved to ZHA due upcoming Threat and Matter support. But now Conenction to Battery powered devices got lost after a few days.

MattWestb commented 1 year ago

@hwinkel What end devices is you having problem with and is they having the coordinator as its parent ? I recommending blocking the coordinator having direct children and all end device using other routers that is making the mesh network robust and also working better if need restating the host system or the coordinator hardware is offline all devices is working OK in the network and is not loosing the network. The recommendation is is for both NCP and RCP coordinators.

RCP is working super with IKEA controllers at latest production firmware then one of my test network is having 10 routers and 21 end devices (most IKEA but also tuya and LIDL) and running good for over one year now with growing amount of test devices.

shanelord01 commented 1 year ago

I'm have done the opposite, coming from Z2M and moved to ZHA due upcoming Threat and Matter support. But now Conenction to Battery powered devices got lost after a few days.

I've bought a Skyconnect and run thread and matter on a seperate radio to the yellow internal which is zigbee dedicated.

Z2M has solved every single issue I had with ZHA using the same radio (yellow).

Also - Z2M has detected and is doing updates for 50% of my devices OTA - ZHA did nothing in regards to this.

hwinkel commented 1 year ago

@shanelord01 Yeah, I had the same good z2m Experience in terms of OTA and not disconnect Issues. As I have "bricked" by athom Zigbe ETH Bridge by accident (made an uncontrolled Tasmota Update) I was about giving the "native HA" Skyconnect + ZHA a try. But honestly, I was not expecting such fundamental Connectivity Issues with such broadly available Sensors like Hue SML001 and Sonoff Motion Sensors. Now I'm wondering to spend another few hours to revert back to Z2M or "Wait" ZHA and/or Skyconnect FW fixes the issues...