home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.13k stars 29.8k forks source link

Matter light becomes sometimes unavailable on Home Assistant #121724

Closed gabrielbull closed 1 week ago

gabrielbull commented 2 months ago

The problem

Sometimes, Matter over Thread lights will simply become unavailable out of the blue. The device is still pingable, and still is connected to Home Assistant, but the light just becomes unavailable. This has happened to me twice in two days, to 2 different lights, with no settings changes.

While the light is unavailable in Home Assistant, it is still controllable via other Matter hub like Apple Home or the native app of the manufacturer.

https://github.com/home-assistant/core/assets/671923/10e8528a-0644-4f0e-a332-2a63f75f7619

What version of Home Assistant Core has the issue?

core-2024.7.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Matter

Link to integration documentation on our website

https://www.home-assistant.io/integrations/matter/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 2 months ago

Hey there @home-assistant/matter, mind taking a look at this issue as it has been labeled with an integration (matter) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `matter` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign matter` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


matter documentation matter source (message by IssueLinks)

marcelveldt commented 2 months ago

If the light becomes unavailable, it means the connection to the device got lost (the subscription). This should be restored by itself on its own but if there are many connection issues you may run into the situation that we are blocking recovery due to too many unstable connections. Please have a good look in the Matter Server log if you see many errors in there, especially connectivity warnings.

The fact that it still works in Apple Home says nothing, as Apple hides connectivity issues a bit more. Also Multi-admin (adding it to multiple ecosystems at once) adds more stress to the Thread network so enhancing the issues. For instance there is just enough bandwidth to satisfy apple but nothing left for us.

The Nanoleaf app itself uses a direct bluetooth connection, not Thread/Matter.

Start by looking at the logs that you indeed have connectivity issues. Then lets see what causes the issues. Also note that you can stabilize the issue by NOT adding the Nanoleaf lights to both Apple and HA but to HA alone

gabrielbull commented 2 months ago

It does indeed becomes available again eventually, but then another one will be unavailable, etc... For me, this is not a critical issue as my home is all Philips Hue and other HomeKit accessories. I decided to add Matters accessories to one room knowing this was early and might not fully work, so this is not urgent, and I'm willing to tests things to help make things stable.

I currently only added 10 Matter accessories to my whole Smart Home and already am seeing major issues like this one. If 10 accessories can stress the Thread network, I cannot even imagine what a home with hundreds of accessories might look like.

Here is the Matter Server logs from the last 20 minutes.

2024-07-11 12:03:42.383 (Dummy-2) CHIP_ERROR [chip.native.DMG] Subscription Liveness timeout with SubscriptionID = 0x80b50abd, Peer = 01:0000000000000012
2024-07-11 12:03:42.384 (MainThread) INFO [matter_server.server.device_controller.node_18] Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2024-07-11 12:03:46.687 (MainThread) INFO [root] Re-subscription succeeded!
2024-07-11 12:03:46.688 (MainThread) INFO [matter_server.server.device_controller.node_18] Re-Subscription succeeded
2024-07-11 12:03:46.719 (MainThread) INFO [root] Re-subscription succeeded!
2024-07-11 12:03:46.720 (MainThread) INFO [matter_server.server.device_controller.node_20] Re-Subscription succeeded
2024-07-11 12:03:54.422 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:262837883 on exchange 35006i with Node: <000000000000000E, 1> sendCount: 4 max retries: 4
2024-07-11 12:03:57.415 (Dummy-2) CHIP_ERROR [chip.native.DMG] Time out! failed to receive report data from Exchange: 35006i with Node: <000000000000000E, 1>
2024-07-11 12:03:57.417 (MainThread) INFO [matter_server.server.device_controller.node_14] Previous subscription failed with Error: 50, re-subscribing in 9351 ms...
2024-07-11 12:04:02.012 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:3515231 on exchange 35008i with Node: <0000000000000006, 1> sendCount: 4 max retries: 4
2024-07-11 12:04:05.184 (Dummy-2) CHIP_ERROR [chip.native.DMG] Time out! failed to receive report data from Exchange: 35008i with Node: <0000000000000006, 1>
2024-07-11 12:04:10.129 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768595 on exchange 35013i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:04:13.563 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:04:23.364 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768596 on exchange 35014i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:04:26.846 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:04:37.310 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768586 on exchange 35000i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:04:37.502 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768597 on exchange 35015i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:04:40.130 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:04:47.305 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:04:50.430 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768598 on exchange 35016i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:04:53.412 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:05:03.669 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768600 on exchange 35018i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:05:06.695 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:05:06.696 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:560: CHIP Error 0x00000032: Timeout'
2024-07-11 12:05:06.698 (MainThread) INFO [matter_server.server.device_controller.node_14] Previous subscription failed with Error: 50, re-subscribing in 5739 ms...
2024-07-11 12:05:06.699 (MainThread) INFO [matter_server.server.device_controller.node_14] Marked node as unavailable
2024-07-11 12:05:06.699 (MainThread) ERROR [matter_server.server.client_handler] [547505518352] Error while handling: device_command (node 14): src/protocols/secure_channel/CASESession.cpp:560: CHIP Error 0x00000032: Timeout
2024-07-11 12:05:25.140 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768601 on exchange 35019i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:05:28.185 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:05:28.187 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:560: CHIP Error 0x00000032: Timeout'
2024-07-11 12:05:28.189 (MainThread) INFO [matter_server.server.device_controller.node_14] Previous subscription failed with Error: 50, re-subscribing in 19627 ms...
2024-07-11 12:05:50.621 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768602 on exchange 35024i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:05:54.347 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768599 on exchange 35017i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:05:54.451 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:06:04.033 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:06:04.668 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768603 on exchange 35025i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:06:07.733 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:06:17.895 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768605 on exchange 35027i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:06:21.015 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:06:30.775 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768606 on exchange 35028i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:06:34.300 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:06:39.824 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 18 activity on MDNS, trigger resubscribe
2024-07-11 12:06:41.933 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 18 activity on MDNS, trigger resubscribe
2024-07-11 12:06:43.775 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 18 activity on MDNS, trigger resubscribe
2024-07-11 12:06:44.818 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768607 on exchange 35029i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:06:47.581 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:06:47.582 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:560: CHIP Error 0x00000032: Timeout'
2024-07-11 12:06:47.584 (MainThread) INFO [matter_server.server.device_controller.node_14] Previous subscription failed with Error: 50, re-subscribing in 26626 ms...
2024-07-11 12:07:11.928 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768604 on exchange 35026i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:07:16.228 (MainThread) INFO [root] Re-subscription succeeded!
2024-07-11 12:07:16.228 (MainThread) INFO [matter_server.server.device_controller.node_9] Re-Subscription succeeded
2024-07-11 12:07:16.920 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768608 on exchange 35034i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:07:20.303 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:07:20.761 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:07:29.970 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768611 on exchange 35037i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:07:33.586 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:07:44.105 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768613 on exchange 35039i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:07:46.873 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:07:56.731 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768614 on exchange 35040i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:08:00.156 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:08:09.896 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768615 on exchange 35041i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:08:13.437 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:08:13.438 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:560: CHIP Error 0x00000032: Timeout'
2024-07-11 12:08:13.440 (MainThread) INFO [matter_server.server.device_controller.node_14] Previous subscription failed with Error: 50, re-subscribing in 33190 ms...
2024-07-11 12:08:27.501 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768612 on exchange 35038i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:08:37.489 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:08:37.490 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:560: CHIP Error 0x00000032: Timeout'
2024-07-11 12:08:37.492 (MainThread) INFO [matter_server.server.device_controller.node_5] Previous subscription failed with Error: 50, re-subscribing in 5361 ms...
2024-07-11 12:08:37.493 (MainThread) INFO [matter_server.server.device_controller.node_5] Marked node as unavailable
2024-07-11 12:09:05.474 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768617 on exchange 35049i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:09:08.295 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:09:18.293 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:65768618 on exchange 35050i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-07-11 12:09:21.577 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 4
2024-07-11 12:09:25.052 (MainThread) INFO [root] Re-subscription succeeded!
2024-07-11 12:09:25.052 (MainThread) INFO [matter_server.server.device_controller.node_14] Re-Subscription succeeded
2024-07-11 12:09:38.435 (MainThread) INFO [root] Re-subscription succeeded!
2024-07-11 12:09:38.435 (MainThread) INFO [matter_server.server.device_controller.node_5] Re-Subscription succeeded
agners commented 2 months ago

What type of Thread Border Router are you using? Any Pattern on what lights go unavailable (are those always the same? or rather random ones?).

It does seem the light indeed is still present on the network, but seems not to respond to our CASE session setup attempts :thinking:

Are you using other Matter controllers besides Apple Home and Home Assistant?

gabrielbull commented 2 months ago

What type of Thread Border Router are you using?

I'm using Apple TVs and HomePods as Thread Border Routers.

Capture d’écran, le 2024-07-11 à 18 13 50

Any Pattern on what lights go unavailable (are those always the same? or rather random ones?).

Out of the 6 lights I setup in my living room, 2 have not gone unavailable, the other 4 do it a often.

Are you using other Matter controllers besides Apple Home and Home Assistant?

No

jvmahon commented 2 months ago

Are they Nanoleaf Essentials bulbs? The most recent Nanoleaf firmware (3.6.196) greatly improves their function (Nanoleaf updated to Matter 1.2 as part of firmware 3.6.173 and that greatly improved reliability). You have to update using the Nanoleaf app - Nanoleaf has not released the update as an Over-The-Air updating, so you must use their app.

https://helpdesk.nanoleaf.me/en-US/nanoleaf-essentials-matter-release-notes-255125

gabrielbull commented 2 months ago

@jvmahon Yes they are, but they're already on firmware 3.6.196. I doubt its the light bulbs as they remain connected to Apple Home without any issue.

agners commented 2 months ago

What Home Assistant OS version are you using and what hardware are you on?

It could be that only certain Thread border routers cause troubles, can you maybe try to take some offline and see if that brings devices back?

marcelveldt commented 2 months ago

@jvmahon Yes they are, but they're already on firmware 3.6.196. I doubt its the light bulbs as they remain connected to Apple Home without any issue.

Note that Apple "hides" connectivity issues way more than HA does. Also "multi admin" may even cause stability issues as it adds stress on the light and thread network. Did you try as I suggested by adding one or more Nanoleaf lights only to HA as a test if they remain stable ?

marcelveldt commented 1 month ago

So, I actually ran into this same issue myself. Nodes going unavailable, slow responding etc. My matter server log was also full of the same issues as I see in your log @gabrielbull - a clear indication of connection issues. In the end I have been restarting devices and border routers and even re-setup a few BR's until my log cleared up.

Now, no more connection issues and a super fast and stable connection on both HA and Apple Home. The log is entirely silent so no more "CASESession timed out" errors etc.

I noticed that when I was having issues, Apple Home was still responsive (as in: it didnt list the device as unavailable) but it was slow to respond. So most likely Apple is just doing a hell lot of retries to hide the connection issues, and most probably that also ensures there is no more bandwidth for HA available.

So long story short: shutdown all your border routers and thread devices. Then power up one BR and one device in close range. If that is and stays stable, move on with powering another device or BR until it either stays stable or you find the device causing the issues.

gabrielbull commented 1 month ago

@marcelveldt So I unplugged all BRs except one and seems to have made things worst. Now 4 of the devices are simply no longer available in HA, and HA suggests I remove them, while still being available in Apple Home. The server logs seems to be worst as well:

image
2024-07-20 12:29:12.645 (MainThread) INFO [matter_server.server.device_controller] <Node:10> Previous subscription failed with Error: 50, re-subscribing in 56003 ms...
2024-07-20 12:29:29.702 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:209851661 on exchange 17278i with Node: <000000000000000E, 1> sendCount: 4 max retries: 4
2024-07-20 12:29:33.383 (MainThread) ERROR [matter_server.server.client_handler] [547524527312] Error while handling: device_command (node 14): src/app/CommandSender.cpp:328: CHIP Error 0x00000032: Timeout
2024-07-20 12:30:31.300 (Dummy-2) CHIP_ERROR [chip.native.DIS] Timeout waiting for mDNS resolution.
2024-07-20 12:30:34.452 (Dummy-2) CHIP_ERROR [chip.native.DIS] Timeout waiting for mDNS resolution.
2024-07-20 12:30:45.299 (Dummy-2) CHIP_ERROR [chip.native.DIS] OperationalSessionSetup[1:0000000000000015]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
2024-07-20 12:30:45.300 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout'
2024-07-20 12:30:45.302 (MainThread) INFO [matter_server.server.device_controller] <Node:21> Previous subscription failed with Error: 50, re-subscribing in 63683 ms...
2024-07-20 12:30:48.450 (Dummy-2) CHIP_ERROR [chip.native.DIS] OperationalSessionSetup[1:000000000000000A]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
2024-07-20 12:30:48.451 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout'
2024-07-20 12:30:48.452 (MainThread) INFO [matter_server.server.device_controller] <Node:10> Previous subscription failed with Error: 50, re-subscribing in 73855 ms...
2024-07-20 12:30:52.794 (Dummy-2) CHIP_ERROR [chip.native.DMG] Time out! failed to receive report data from Exchange: 47404r with Node: <0000000000000012, 1>
2024-07-20 12:30:52.795 (MainThread) INFO [matter_server.server.device_controller] <Node:18> Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2024-07-20 12:30:53.881 (Dummy-2) CHIP_ERROR [chip.native.DIS] Timeout waiting for mDNS resolution.
2024-07-20 12:30:58.316 (MainThread) INFO [root] Re-subscription succeeded!
2024-07-20 12:30:58.317 (MainThread) INFO [matter_server.server.device_controller] <Node:18> Re-Subscription succeeded
2024-07-20 12:31:07.878 (Dummy-2) CHIP_ERROR [chip.native.DIS] OperationalSessionSetup[1:0000000000000009]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
2024-07-20 12:31:07.879 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout'
2024-07-20 12:32:19.261 (Dummy-2) CHIP_ERROR [chip.native.DIS] Timeout waiting for mDNS resolution.
2024-07-20 12:32:32.386 (Dummy-2) CHIP_ERROR [chip.native.DIS] Timeout waiting for mDNS resolution.
2024-07-20 12:32:33.259 (Dummy-2) CHIP_ERROR [chip.native.DIS] OperationalSessionSetup[1:000000000000000A]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
2024-07-20 12:32:33.260 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout'
2024-07-20 12:32:33.262 (MainThread) INFO [matter_server.server.device_controller] <Node:10> Previous subscription failed with Error: 50, re-subscribing in 166662 ms...
2024-07-20 12:32:46.382 (Dummy-2) CHIP_ERROR [chip.native.DIS] OperationalSessionSetup[1:0000000000000015]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
2024-07-20 12:32:46.383 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout'
2024-07-20 12:32:46.386 (MainThread) INFO [matter_server.server.device_controller] <Node:21> Previous subscription failed with Error: 50, re-subscribing in 210091 ms...
2024-07-20 12:34:10.357 (Dummy-2) CHIP_ERROR [chip.native.DIS] Timeout waiting for mDNS resolution.
2024-07-20 12:34:24.323 (Dummy-2) CHIP_ERROR [chip.native.DIS] OperationalSessionSetup[1:000000000000000A]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
2024-07-20 12:34:24.324 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout'
2024-07-20 12:34:24.326 (MainThread) INFO [matter_server.server.device_controller] <Node:10> Previous subscription failed with Error: 50, re-subscribing in 256401 ms...

What Home Assistant OS version are you using and what hardware are you on?

It could be that only certain Thread border routers cause troubles, can you maybe try to take some offline and see if that brings devices back?

I'm on a Home Assistant Yellow running the latest everything:

Core: 2024.7.3 Supervisor: 2024.06.2 Operating System: 12.4 Frontend: 20240710.0

marcelveldt commented 1 month ago

OK, so somehow the communication is failing from HA to that border router. Is Apple Home still able to communicate to those nodes just fine ? No delays or whatsoever.

marcelveldt commented 1 month ago

Some more things to try (one by one, with at least 30 minutes between the attempts):

gabrielbull commented 1 month ago

OK, so somehow the communication is failing from HA to that border router. Is Apple Home still able to communicate to those nodes just fine ? No delays or whatsoever.

Yes Apple Home was still able to communicate with the nodes when only 1 BR was online, without delays.

Some more things to try (one by one, with at least 30 minutes between the attempts):

Restart the entire matter server host Restart your network switch and/or access point Restart the border router(s)

I will try this soon.

marcelveldt commented 1 week ago

I don't know if you are still experiencing the issue but we now have created a global issue for this to track progress: https://github.com/home-assistant/core/issues/123835

Please follow that issue if you still have issues, thanks!