project-chip / rs-matter

Rust implementation of the Matter protocol. Status: Experimental
Apache License 2.0
303 stars 43 forks source link

Keep the IP address of the subscribing peer node in subscriptions? #182

Closed ivmarkov closed 1 month ago

ivmarkov commented 1 month ago

Copying the part of #181 which is still unresolved here, to keep the discussion open:

Apple Matter is a bit weird. Putting aside the fact that we are commissioned into two fabrics, the other weird thing that happens is that once we are commissioned in the second fabric, they are communicating with us on behalf of TWO different IP addresses, which - however - do represent the same node+fabric ID?!

And now the real weird part: we receive a subscription request from IP Address 2 and this all goes well. But when the time comes to report on that subscription in some seconds / minutes and initiate a new exchange, it just so happens that we select an existing session for "IP Address 1" instead, because we are just looking for a session matching the fabric index (=1) and the node ID. And since the first address also has created sessions to us on behalf of fabric 1, we just happen to pick a session for IP Address 1, even if the subscription came from IP Address 2.

... and get "Invalid Subscription" from the other peer.

Now, if we pick a session based on fabric index + peer node ID + remote address, then we select a session with "IP Address 2" (the address from where the subscription came) and all is well!

So @kedars what do you think is going on? Shall I just also keep the IP address in the subscription data and do a match by that address too? This works (confirmed) but sounds a bit weird. Am I missing something in the spec?

Session dump:

Sessions for IP Address 1:
=============================================

(Skip this block, it contains non-CASE sessions, not interesting):
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(1088638830675445622), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 9017568, mode: PlainText, ts: 1717612457.779132481s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(0), local_nodeid: 0, local: 1, remote: 8566, msg_ctr: 44517232, mode: Pase(NotExpected), ts: 1717612462.458154527s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(9297024357880017542), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 87475376, mode: PlainText, ts: 1717612462.78951294s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(17849542785653146437), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 35224863, mode: PlainText, ts: 1717612493.671745767s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(1088638830675445622), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 9017568, mode: PlainText, ts: 1717612457.779132481s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(0), local_nodeid: 0, local: 1, remote: 8566, msg_ctr: 44517232, mode: Pase(NotExpected), ts: 1717612462.458154527s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(9297024357880017542), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 87475376, mode: PlainText, ts: 1717612462.78951294s
Session: fb_idx: None,    peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(17849542785653146437), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 35224863, mode: PlainText, ts: 1717612493.671745767s

!!! This block is interesting:
Session: fb_idx: Some(1), peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(864992019), local_nodeid: 1121512030, local: 2, remote: 8567, msg_ctr: 198121502, mode: Case(CaseDetails { fab_idx: 1, cat_ids: [0, 0, 0] }), ts: 1717612496.261807843s
Session: fb_idx: Some(2), peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(3725013030), local_nodeid: 4012158658, local: 3, remote: 8568, msg_ctr: 254531801, mode: Case(CaseDetails { fab_idx: 2, cat_ids: [0, 0, 0] }), ts: 1717612494.724025958s
Session: fb_idx: Some(1), peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(864992019), local_nodeid: 1121512030, local: 2, remote: 8567, msg_ctr: 198121502, mode: Case(CaseDetails { fab_idx: 1, cat_ids: [0, 0, 0] }), ts: 1717612500.816538095s
Session: fb_idx: Some(2), peer: [fe80::8d7:cf23:8b0b:f715%2]:51398, peer_nodeid: Some(3725013030), local_nodeid: 4012158658, local: 3, remote: 8568, msg_ctr: 254531801, mode: Case(CaseDetails { fab_idx: 2, cat_ids: [0, 0, 0] }), ts: 1717612494.724025958s

Sessions for IP Address 2:
=============================================
(Skip this block, it contains non-CASE sessions, not interesting):
Session: fb_idx: None,    peer: [fe80::416:93dd:216a:d50a%2]:65218, peer_nodeid: Some(1162526195568380034), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 84020839, mode: PlainText, ts: 1717612496.321803477s
Session: fb_idx: None,    peer: [fe80::416:93dd:216a:d50a%2]:65218, peer_nodeid: Some(1162526195568380034), local_nodeid: 0, local: 0, remote: 0, msg_ctr: 84020839, mode: PlainText, ts: 1717612496.321803477s

!!! This block is interesting: see - fb_idx = 1 and peer_nodeid = 864992019 which is also present for Address 1?!
Session: fb_idx: Some(1), peer: [fe80::416:93dd:216a:d50a%2]:65218, peer_nodeid: Some(864992019), local_nodeid: 1121512030, local: 4, remote: 48265, msg_ctr: 226861159, mode: Case(CaseDetails { fab_idx: 1, cat_ids: [0, 0, 0] }), ts: 1717612500.815073586s
Session: fb_idx: Some(1), peer: [fe80::416:93dd:216a:d50a%2]:65218, peer_nodeid: Some(864992019), local_nodeid: 1121512030, local: 4, remote: 48265, msg_ctr: 226861159, mode: Case(CaseDetails { fab_idx: 1, cat_ids: [0, 0, 0] }), ts: 1717612500.815073586s
ivmarkov commented 1 month ago

@kedars I think I just confirmed even a stronger statement by looking at the C++ SDK: The implementation of subscriptions in the SDK is such that the unsolicited subscription data reports always happen not just on behalf of the same fabric+node+IP address, but on behalf of the same session. Look here: https://github.com/project-chip/connectedhomeip/blob/master/src/app/reporting/Engine.cpp#L514

They are basically re-using the read handler on which the original subscription request came, and this read handler is bound to a session. So in a way, if the session is gone, the subscription created via this session is gone too.

I can easily do a similar change in our code base (i.e. binding the subscription to a concrete session).

Shall I?

ivmarkov commented 1 month ago

More research:

While I'm now 100% sure the C++ SDK by default re-uses the read handler and its attached session, there is an optional #ifdef-ed part which allows subscriptions to be "persisted".

When a subscription is persisted (say, for an ICD device - in fact I think that's the only use case for subscription persistence) and then de-persisted, what happens is that a suitable session for that subscription is searched for (and if not found - created - which we still do not support as that would imply mDNS lookups).

Now, the search for a suitable session is done only by using fabric index and peer node ID - 100% as we do it too.

The major difference however is that if the subscriptions are persisted, then the sessions are persisted too (or just removed). However, I don't think session persistence involves persistence of the IP address. So on de-persisting, the C++ SDK will do a fresh mDNS lookup of a new IP address for the de-persisted session/subscription.

So my hypothesis is, if the persistence code kicks-in, due to the mDNS lookup we'll not get into our corner case because I've checked - the IPv6 address responding to the mDNS lookups is not the problematic one from my earlier analysis.

kedars commented 1 month ago

You've already discovered the persistent subscriptions; that was the only scenario I was thinking where this might be required. But anyway discovering the IP Address based on its mDNS name is the right thing to do.

I am also doing an experiment to see how the C++ SDK and Apple devices behave in the scenario that you highlighted. Since they share node-ids across devices that have the same sign-in. But I am not able to get requests from the second device (iPhone) in my setup. It always comes from the hub.

Regardless, storing the IP Address feels definitely like the wrong thing to do (The Address itself can change across device-resets or others, so relying on discovery is more reliable).

@andreilitvin any observations from you on this?

ivmarkov commented 1 month ago

What about the other approach that you would see shortly in a forthcoming PR which is for something related? I.e.:

This way I think we are as close to the C++ SDK behavior as possible, and also future-proof once we implement mDNS lookups