home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.14k stars 29.18k forks source link

HomeKit Device Integration - Secretlab MAGRGB Failing upon Thread Provisioning #115361

Closed adambreguet closed 1 week ago

adambreguet commented 3 months ago

The problem

Not sure if this is the right place to post this, if not I'd be grateful if you pointed me in the right direction!

I have Home Assistant in a container on my Unraid server. Recently purchased a Secretlab MAGRGB light strip for my desk, essentially a Nanoleaf device. At first I couldn't get HA to initialise the device at all, I seemed to have an issue with the built-in Bluetooth chip on my motherboard (ASUSTek Qualcomm). I bought one of the recommended Bluetooth dongles (Orico BTA-403), and the device is now initialising and seems to work as a Bluetooth device in HA (although seemingly not cleanly as per the logs).

The answer to the below might be to just suck it up and leave it as a Bluetooth device, but anyway, onto the issue at hand:

When trying to provision the device onto my Thread network, it successfully provisions the device onto the Thread network and I can add the device into the Nanoleaf app and it comes up as a Thread device (see screenshots below), previously the Nanoleaf wouldn't add it to Thread. The Thread network is a custom OpenThread Border Router on a raspberry pi using the SkyConnect as the radio (I only realised after I bought it the HA Thread BR integration required HA OS).

I've attached the logs from first adding the device as a Bluetooth device until after attempting to provision it onto Thread. If there's any further info I can provide that might shed any light, please let me know. If anyone could provide any assistance I'd be very thankful.

home-assistant_homekit_controller.log

Screen Shot 2024-04-10 at 11 21 44 pm

Screen Shot 2024-04-10 at 11 23 37 pm

Screenshot_20240410_214602_Nanoleaf

image

home-assistant_homekit_controller.log

What version of Home Assistant Core has the issue?

core-2024.4.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

HomeKit Device

Link to integration documentation on our website

https://www.home-assistant.io/integrations/homekit_controller

Diagnostics information

Note - diagnostics is pre-provisioning, after provisioning diagnostics no longer available.

homekit_controller-f0ddb03bbb74f5971ef038a1f850f1e9-Secretlab MAGRGB 4428-1103180937550849980e7536704fcd45.json

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-04-10 22:01:17.878 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify timed out
2024-04-10 22:01:17.878 ERROR (MainThread) [aiohomekit.utils] Failure running background task: Task-9597
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/connection.py", line 355, in do_pair_verify
    response = await coap_client.request(request).response
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/connection.py", line 395, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/connection.py", line 354, in do_pair_verify
    async with asyncio_timeout(8.0):
  File "/usr/local/lib/python3.12/asyncio/timeouts.py", line 115, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/pairing.py", line 111, in _ensure_connected
    await self.connection_future
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/connection.py", line 398, in connect
    raise AccessoryDisconnectedError("Pair verify timed out")
aiohomekit.exceptions.AccessoryDisconnectedError: Pair verify timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/utils.py", line 40, in _handle_task_result
    task.result()
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/pairing.py", line 169, in _process_config_changed
    await self.list_accessories_and_characteristics()
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/pairing.py", line 147, in list_accessories_and_characteristics
    await self._ensure_connected()
  File "/usr/local/lib/python3.12/site-packages/aiohomekit/controller/coap/pairing.py", line 113, in _ensure_connected
    raise AccessoryDisconnectedError("failed to connect")
aiohomekit.exceptions.AccessoryDisconnectedError: failed to connect

Additional information

No response

home-assistant[bot] commented 3 months ago

Hey there @jc2k, @bdraco, mind taking a look at this issue as it has been labeled with an integration (homekit_controller) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `homekit_controller` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign homekit_controller` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


homekit_controller documentation homekit_controller source (message by IssueLinks)

Jc2k commented 3 months ago

Thanks for the detailed report.

To confirm, is this your first thread device?

At first glance, this looks like a network issue. From your log I can see that the device joined the thread mesh and became visible in zeroconf with an ip6 address of fd11:22::4fba:9cbb:311f:ce55. Then i can see many attempts to establish a secure COAP session but they all time out without receiving any UDP packets from the device at all.

There are a bunch of red flags with your setup that match these symptoms.

First of all, are you running your HA container in host network mode? What configuration did you do on unraid to prepare it for thread? Do these look familiar? net.ipv6.conf.wlan0.accept_ra AND net.ipv6.conf.wlan0.accept_ra_rt_info_max_plen.

Second of all, do you have vlans on your home network? Is your unraid box on the same vlan as the raspberry pi?

adambreguet commented 3 months ago

Thanks for your quick reply!

Yes, this is my first thread device.

Re networking being an issue, my router is slightly older at this point (it's a D-link DIR-882), could that be a potential issue?

HA is running in host mode. Is there particular configuration you're meant to do on unraid to prepare it for thread? I googled those two settings (haven't heard of them before), and ran sysctl -a on both the unraid machine and the raspberry pi. Unraid has net.ipv6.conf.wg0.accept_ra = 1 net.ipv6.conf.wg0.accept_ra_rt_info_max_plen = 0 (it doesn't have wlan0)

The raspberry pi has net.ipv6.conf.wlan0.accept_ra = 0 net.ipv6.conf.wlan0.accept_ra_rt_info_max_plen = 0

And no, no vlans on my network anywhere.

Jc2k commented 3 months ago

Is wg0 the only interface that has .accept_ra_rt_info_max_plen on unraid box?

adambreguet commented 3 months ago

There's quite a few interfaces that have it, see screenshot Screenshot_20240411_005022_Chrome

Jc2k commented 3 months ago

Great. The ones for eth0 need to be set to 64. And if that doesn't work try br0 as well. Will take 15-20 mins to kick in at worst. Monitor the output of ip -6 route and you should see some new routes appear. The accept_ra one should be 1 on the same interfaces.

This should potentially get things to seemingly work, however I'll need to follow up with some other checks as it'll likely still be flakey. But let's try this on its own first.

adambreguet commented 3 months ago

Thanks for your help so far. Had an electrician over while I was out to do a safety check and he switched the power off... Had quite a few issues to fix.

Now that I've got it all back to how it was, I set br0 and eth0 to 64 as advised, and made sure accept_ra was 1. No new routes appeared after the change though.

I also tried doing some digging of my own, and it definitely seems to be a network issue on the Unraid machine. Trying to ping6 the ipv6 address (which is now results in 100% packet loss, but every other device in my household is able to ping it without issue. It also doesn't seem to be specifically an ipv6 issue, as I was able to ping6 google.com and also other ipv6 addresses of various devices on my local network.

Any other ideas of what could fix it?

Jc2k commented 3 months ago

Can you show me your current ip -6 route output? (on your unraid)

adambreguet commented 3 months ago

Screen Shot 2024-04-15 at 11 41 22 pm

Jc2k commented 3 months ago

Main observation there is that there are no obvious thread routes at all.

There are 2 kinds of route i'd expect - one like this:

fd18:b7a9:775a::/64 via fe80::88f:b3bc:623c:3545 dev br0 metric 1024 expires 0sec

and one like this

fdac:693e:50e1:654b::/64 dev br0 metric 256 expires 0sec

Based on what you've shared I'd expect them to start fd and be on br0.

If you are missing the 2nd kind it could indicate that the accept_ra setting is wrong, that there is some firewall blocking incoming icmp6 packets or that something else on your network is blocking icmp6 (common cause is vlans or bad wifi network hardware).

I'd start by running tcpdump to make sure unraid is seeing the packets.

sudo tcpdump -ni br0 icmp6 -v

And wait. Might be 3 minutes. Might be 30. We are looking for something that is spread over multiple lines like this:

14:58:57.836565 IP6 (flowlabel 0xf0b00, hlim 255, next-header ICMPv6 (58) payload length: 72) fe80::1808:6fd7:8776:b8e > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 72
        hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): fdac:693e:50e1:654b:f800::/64, Flags [onlink, auto], valid time 1800s, pref. time 1800s
          route info option (24), length 16 (2):  fd18:b7a9:775a::/64, pref=medium, lifetime=1800s
          unknown option (26), length 8 (1): 
          0x0000:  8000 0000 0000

We are looking for a "prefix info option " and a "route info option".

If you haven't seen any after 10 minutes, i'd probably restart teh tcpdump on eth0.

Jc2k commented 3 months ago

If you still don't see anything, try setting accept_ra to 2 instead of 1 and repeat. If that gets a hit, post your new "ip -6 route" output.

adambreguet commented 3 months ago

I get the following three hits tcpdump which might be relevant. The first is the ip6 address of the raspberry pi, the second is my router, and not sure what the third is

18:58:38.054282 IP6 (flowlabel 0x4eda1, hlim 255, next-header ICMPv6 (58) payload length: 40) fe80::b09e:4e12:80d7:d017 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 40
    hop limit 0, Flags [other stateful], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
      unknown option (26), length 8 (1): 
      0x0000:  8000 0000 0000

18:58:56.245525 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 128) fe80::76da:daff:fed0:9fae > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 128
    hop limit 64, Flags [other stateful], pref medium, router lifetime 3600s, reachable time 0ms, retrans timer 0ms
      prefix info option (3), length 32 (4): 2403:5814:b73e::/64, Flags [onlink, auto], valid time 7201s, pref. time 3600s
      route info option (24), length 24 (3):  2403:5814:b73e::/48, pref=high, lifetime=7201s
      rdnss option (25), length 40 (5):  lifetime 2400s, addr: 2606:4700:4700::1111 addr: 2606:4700:4700::1001
      mtu option (5), length 8 (1):  1500
      source link-address option (1), length 8 (1): 74:da:da:d0:9f:ae

18:59:07.344725 IP6 (flowlabel 0x90100, hlim 255, next-header ICMPv6 (58) payload length: 40) fe80::42f:840a:156e:a809 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 40
    hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
      route info option (24), length 16 (2):  2403:5814:b73e:1000::/64, pref=medium, lifetime=1800s
      unknown option (26), length 8 (1): 
      0x0000:  8000 0000 0000 
adambreguet commented 3 months ago

Changing accept_ra to 2 is giving me the following from ip -6 route, so getting some new routes now.

::1 dev lo proto kernel metric 256 pref medium
2403:5814:b73e::/64 dev br0 proto kernel metric 256 expires 6979sec pref medium
2403:5814:b73e::/64 dev br0 proto ra metric 1005 mtu 1500 pref medium
2403:5814:b73e:1000::/64 via fe80::42f:840a:156e:a809 dev br0 proto ra metric 1024 expires 1712sec pref medium
2403:5814:b73e::/48 via fe80::76da:daff:fed0:9fae dev br0 proto ra metric 1024 expires 6979sec pref high
fd11:22::/64 via fe80::b09e:4e12:80d7:d017 dev br0 proto ra metric 1024 expires 1426sec pref medium
fd17::/64 dev docker0 proto kernel metric 256 pref medium
fd17::/64 dev docker0 metric 1024 pref medium
fe80::/64 dev br0 proto kernel metric 256 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev docker0 proto kernel metric 256 pref medium
fe80::/64 dev veth4a648c9 proto kernel metric 256 pref medium
fe80::/64 dev vetha4cb3fd proto kernel metric 256 pref medium
fe80::/64 dev br-494b88331947 proto kernel metric 256 pref medium
fe80::/64 dev veth5ed8699 proto kernel metric 256 pref medium
fe80::/64 dev veth8aec24d proto kernel metric 256 pref medium
fe80::/64 dev vethdf1f0d6 proto kernel metric 256 pref medium
fe80::/64 dev veth598880c proto kernel metric 256 pref medium
fe80::/64 dev vethce68264 proto kernel metric 256 pref medium
fe80::/64 dev veth38c34d3 proto kernel metric 256 pref medium
fe80::/64 dev vethe4626a2 proto kernel metric 256 pref medium
fe80::/64 dev veth5f4bf28 proto kernel metric 256 pref medium
fe80::/64 dev vethfc3d81e proto kernel metric 256 pref medium
fe80::/64 dev veth8babcae proto kernel metric 256 pref medium
fe80::/64 dev veth9cb9fcd proto kernel metric 256 pref medium
fe80::/64 dev vethed281ef proto kernel metric 256 pref medium
fe80::/64 dev veth4bfaf62 proto kernel metric 256 pref medium
fe80::/64 dev veth9f871aa proto kernel metric 256 pref medium
default via fe80::76da:daff:fed0:9fae dev br0 proto ra metric 1005 mtu 1500 pref medium
default via fe80::76da:daff:fed0:9fae dev br0 proto ra metric 1024 expires 3378sec hoplimit 64 pref medium
Jc2k commented 3 months ago

The fd11 route looks promising. Now would be a good time to try and set up the nanoleaf again.

adambreguet commented 3 months ago

Ok, looks like it's all up and working. Device is responding well. I've added those settings to /etc/sysctl.conf so they stick around on reboot.

Can I assume all good to go from here? You mentioned about it potentially still being flakey earlier.

issue-triage-workflows[bot] commented 2 weeks ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.