project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.43k stars 1.99k forks source link

[BUG] Subscriptions for Matter devices for user is unnecessarily delayed until next scheduled resub retry in some cases #32241

Open ndyck14 opened 8 months ago

ndyck14 commented 8 months ago

Reproduction steps

Topology: 10x Nanoleaf A19s, 1x HPM

Steps: Power down 5x Nanoleaf devices (running 3.6.x with CASE sub resume) Power cycle HPM (unclear if strict precondition) Wait ~1.5 hours Power up 5x Nanoleaf A19s

Results: 4 devices successfully reconnect. 1 device is left uncontrollable for ~an hour:

  1. initial CASE attempt fails (for unknown reasons), rescehdule to 1 hour based on how long the device has been gone
  2. device proactively re-establishes CASE 15 seconds later
  3. SDK still waits for an hour to do resub on active session

Bug prevalence

With correct preconditions it will be 100%. Otherwise it seems to depend on how many devices are booted up at a time

GitHub hash of the SDK that was being used

tvOS 17.3

Platform

darwin

Platform Version(s)

No response

Anything else?

No response

bzbarsky-apple commented 8 months ago

In particular, we only retrigger subscription on receiving a ReportData, not on CASE establishment from the other side or the other side sending any other IM message.

@jtung-apple

jtung-apple commented 8 months ago

I'm taking a look but my first thought is that currently the logic for re-subscription is only triggered when the IME gets called on OnUnsolicitedReportData. I could look into the changes needed to have CASE establishment plumb through to trigger re-subscribe and we can discuss if that's what's needed here.

@ndyck14 Do you happen to be able to readily reproduce this? If so could you upload logs from both devices?

I'm wondering if there's something else wrong / a bug that's causing this, that should be fixed first.

ndyck14 commented 8 months ago

Power cycle HPM (unclear if strict precondition)

I did this with the intent of clearing CASE resume contexts (guesswork) so as to test worst case. Basically my brief mental model of resume sub is its a best effort by device. Otherwise CASE is re-established because we're looking for OTA i think.

I can try to reproduce, but is there evidence to suggest that this onus should not be on the subscriber to ensure its done as swiftly as possible? I guess in case the sub is already active, we don't want to double up? Is doubling up even possible?

ndyck14 commented 8 months ago

note that my test steps were in done directed after already observing this previously without logs installed, so I've seen this happen multiple times. I've also been tracking CASEs (pun intended) for a year or more where things take too long to reconnect (eg #25091, which Boris reported on my behalf )

bzbarsky-apple commented 8 months ago

We should consider triggering resubscribe on both CASE establishment (using the new session, not creating a new one), and on any IM message received, not just ReportData.

woody-apple commented 7 months ago

Assigning to me.