Closed holblin closed 7 months ago
Hey there @jc2k, @bdraco, mind taking a look at this issue as it has been labeled with an integration (homekit_controller
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
homekit_controller documentation homekit_controller source (message by IssueLinks)
Same issue, but restart doesn't help (for me), only reboot does. I only use BT devices so far, so it might not be Thread only.
If you don't use thread it's not this issue, thanks.
@holblin those errors indicate that packet loss is occurring. Obviously we can't stop packet loss occurring (you need more router capable devices or to be closer to the BR). Packet loss causes the encryption session to become invalid - each packet is encrypted with a key derived from the number of packets exchanged. So if we lose enough packets we can't derive the correct key. When that happens we have to start again from the beginning. Those numbers in the logs indicate when that packet loss has happened.
Off the top of my head, thread is more reliant on streaming events to update the state in HA than the other HomeKit protocols, which is why it can sometimes be bad at switching to "unavailable".
Unfortunately the expert on the thread support is not around these days. And it is reverse engineered.
I can see problems like this with my own Nanoleaf device - about once a month (if that) here. Up until very recently the problem was (for myself and many other discord users I've helped) normally traceable to thread networking itself (bad routing, bad mesh topology, bad prosumer network gear).
Now that side is getting better, we might have better luck tracking it down.
What I need is full debug logs while it is happening. (Either enabled through the UI or make sure to turn on debugging for aiohomekit if doing manually in config file). And ideally, I need to see its mdns record when it's working, and again when it stops working.
It stopped working yesterday during last night. I enabled the debug earlier so I will see how to send that file (I edit my message and attach it). I have one HomePod and multiple blinds, some close by, some further away, does a bad connection to the further away makes everything dropped even the close by?
That's the diagnostic download, not the log file.
Same Here. Homekit becomes unavailable until restart HA.
@transalpia please read my messages earlier in the ticket and if possible provide the information requested. We can't move this issue forward without debug logs.
Please provide as much detail about your environment as possible. For example, not using HAOS will likely cause frequent outages that look like this problem. Some brands of switches can interfere with multicast. Some brands of BR are also just less good. The number of and placement of BRs relative to the device could Be a factor. And of course, we know some vendors just have devices that crash a lot. It could even be a matter device from a different vendor to the ones you are struggling with - if a mains powered thread device crashes any connected devices will be affected.
All of these factors are in addition to the problem in the original post, but they need to be ruled out to make sure any data gathered is valid for this issue.
@Jc2k today in the morning (last time it happens) i activated "debug protokoll". The issue appears approx once a week. Meanwhile for the 4th time. All iCloud components not available. After restart HA ok again.
Hi @Jc2k , sorry for the delay but I wanted to make sure I clean the log and remove few auth token before sending the file :-) And the file is BIG! I will clean my logs for the next time the error occurs and restart my HA.
https://drive.google.com/file/d/1sEXtP79m83zu1lIIkhcZkjA4AhipGQr-/view?usp=sharing
For my setup I have a HA on a Intel. This is using the HA OS. https://www.amazon.com/gp/product/B09H5961YN/
My thread device are for now, only my blinds from the company eve. They are rechargeable (USB-c), battery is long duration, probably > 6 months.
The thread gateway is my Homepod Mini. The homepod mini is connected through my network via a unifi AP which is very close and there is another AP not far in case the first one drop.
I currently have 7 blinds configured on my system, 2 very close (same room), 2 across one wall, 2 across 2 wall and 1 on the other side of a house (range limit). I planned to buy other Homepod Mini to have better coverage and also have more of my blind connected but I refrain to purchase more until it's working properly.
Does a approximative map of the house + blind position + homepod mini + Wifi AP would be helpfull?
Also, how could I do the mdns record easily? (you probably want to see that from HA?)
@Jc2k: My Setup is HA on a Raspberry4 running since 6 months. The devices affected are 6 x Eve Energy, 5 x Eve Thermo, 1x Eve Weather and 2 x Eve Door and Window. All are thread capable. No others via Homekit! Everything connected via thread to an iPod Mini. HACS is installed, WiFi connection via Unifi. No changes in arrangement or configuration. The error first occurred about 3 weeks ago. All Homekit entities were no longer available, but could be reactivated via an HA restart. The first time about 3 weeks ago I didn't think anything of it. The second time was about 10 days ago. The last time it happened was 2 days ago. Everything is fine at the moment. I currently have the debug log enabled and next time the error occurs I will post it here.
By the way: After disconnecting the network or the HomePod, all devices are usually automatically reconnected.
Are you able to try the beta? Otherwise try the feb release when it's out on Wednesday. I found a case where packet loss can induce an irrecoverable connection.
As above, packet loss causes encryption related counters on both sides to get out of sync. Some devices stop responding when that happens, some send a coap error. We were handling the coap error and resetting the encryption state. We were not doing the same when the device started ignoring us.
Note if the change in 2024.2.0 helps that means you are experiencing either crashing devices or packet loss. The fix just helps with recovery from that, your devices are still having issues.
Installing the 2024.2.0 now and I will report if I have other disconnections 👍
I didn't have any issue since. Closing the issue for now. If I encounter some issue, I will report them here.
The problem
My blinds configured through Thread/Homekit becomes unavailable until I restart my HA.
Here are some screenshot of my HA interface of when the problem occurs:
I also noticed that in the log, they where some Homekit errors:
Note than when a device becomes unavailable, not all the blind are shown as unavailable but as soon as you visit their page/details, they become unavailable.
The only way I found to fix temporary the issue is to restart HA. After the restart, the devices works for some time but they eventually fails. I will report each fail, so the frequency is known but it's pretty often by experience (multiple time per months).
What version of Home Assistant Core has the issue?
core-2024.1.3
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
HomeKit Device
Link to integration documentation on our website
https://www.home-assistant.io/integrations/homekit_controller/
Diagnostics information
Here is the diagnostic of the device:
homekit_controller-514baf1422af40e0e6972880cbb662bc-Eve MotionBlinds 7448-d7332f9cc788103decbb2606a2bad9b9.json.txt
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
Logger: aiohomekit.controller.coap.connection Source: components/homekit_controller/connection.py:891 First occurred: January 16, 2024 at 8:37:12 AM (118 occurrences) Last logged: 1:47:32 PM
Decryption failed, desynchronized? Counter=6053/6058 Failed flailing attempts to resynchronize, self-destructing in 3, 2, 1... Decryption failed, desynchronized? Counter=1204/1209 Decryption failed, desynchronized? Counter=11/20 Decryption failed, desynchronized? Counter=11/28