project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.33k stars 1.97k forks source link

[BUG] Local Binding does not work as OperationalSessionSetup or mDNS does not support lookup of local node. #34829

Open andrew-lifx opened 1 month ago

andrew-lifx commented 1 month ago

Reproduction steps

We have a device which has both an input endpoint (e.g. On/Off Light switch) and an output endpoint (e.g. On/Off Plug-in unit). We then successfully create a binding from the input endpoint to the output endpoint. Activating the input (e.g. button press) does not trigger a binding response (e.g. output toggle) but instead log an error establishing a connection to the node (itself)

Bug prevalence

Does not function at all.

GitHub hash of the SDK that was being used

5bb5c9e23d532cea40476fc0bd1d3008522792ba

Platform

esp32

Platform Version(s)

No response

Anything else?

Background: We have have a switch product which provides up to 4 buttons for on/off control and up to 4 relays for smart switching of devices (e.g exiting dumb lights). Without support for local binding the only way for Matter to allow a user to control any of the relays on the back of the unit from the switches on the front of the unit would be using the generic switch cluster with a hub/controller to do the work. This is not a practical solution for an introduction to smart home control or for any small or hub-less system. A simple scenario like this should not require a Matter hub.

Root cause: The root cause stem from the fact the the OperationalSessionSetup always assumes the target node is remote and attempts to create a connection to the device. To do this it must first finding the target address via mDNS. Both the Minimal mDNS implementation and Espressif implementation do not support lookup of locally advertised services. The result is that no address can be found for the target and the Binding Handler is not triggered.

Recommended Solution: Even if the mDNS implementation did support lookup of local services, it is not clear if a connection to the local node could be established. The simplest approach would seem to be to check the target against the local nodeId(s) , skip the connection step entirely and call the Binding Handler with an appropriate peer_device entry or binding.type that can be used by the handler to invoke the operation locally.

Sample log: This log shows the scenario of the switch with relay combo (node 2) and a light (node1) The node 2 switch input is bound to both the node 2 relay endpoint and the node 1 Light endpoint. The log show that the binding handler for the Light fires successfully but the local node 2 endpoint discovery fails.

I (4413140) chip[ZCL]: SwitchServer: OnInitialPress
I (4413377) chip[ZCL]: SwitchServer: OnShortRelease
I (4413579) chip[ZCL]: SwitchServer: OnMultiPressComplete
I (4413581) chip[DIS]: Found an existing secure session to [1:0000000000000001]!
E (4413583) chip[-]: Sending OnOffcommand to remote node 1 endpoint: 1
I (4413592) chip[EM]: <<< [E:12063i S:9795 M:68989722] (S) Msg TX to 1:0000000000000001 [ED0D] [UDP:[FE80::D273:D5FF:FE5B:F235%st2]:5540] --- Type 0001:08 (IM:InvokeCommandRequest)
I (4413609) chip[DIS]: Resolving B5255A529B14ED0D:0000000000000002 ...
I (4413637) chip[EM]: >>> [E:12063i S:9795 M:225101238 (Ack:68989722)] (S) Msg RX from 1:0000000000000001 [ED0D] --- Type 0001:09 (IM:InvokeCommandResponse)
I (4413642) chip[DMG]: Received Command Response Status for Endpoint=1 Cluster=0x0000_0006 Command=0x0000_0002 Status=0x0
I (4413652) chip[-]: OnOff command succeeds
I (4413658) chip[EM]: <<< [E:12063i S:9795 M:68989723 (Ack:225101238)] (S) Msg TX to 1:0000000000000001 [ED0D] [UDP:[FE80::D273:D5FF:FE5B:F235%st2]:5540] --- Type 0000:10 (SecureChannel:StandaloneAck)
I (4413809) chip[DIS]: Checking node lookup status after 200 ms

I (4458609) chip[DIS]: Checking node lookup status after 45000 ms
E (4458609) chip[DIS]: OperationalSessionSetup[1:0000000000000002]: operational discovery failed: 32
E (4458615) chip[SVR]: Failed to establish connection to node 0x0000000000000002

Slack discussion: https://csamembers.slack.com/archives/CUWHQS3U0/p1722925263418839

bzbarsky-apple commented 1 month ago

Note https://github.com/project-chip/connectedhomeip/issues/21626.

mDNS is the least of the problems here, actually. The basic "find me the CASE session" bit will also fail, because both "ends" or the CASE session will be in the session table.

How this should actually work in practice is very unclear. Especially for clusters that do in fact require that you interact with them over a CASE session.

Apollon77 commented 1 month ago

In my eyes it makes not that much of a sense to build local bindings via network actions actually ...

andrew-lifx commented 1 month ago

In my eyes it makes not that much of a sense to build local bindings via network actions actually ...

How else do you create a link between 2 local endpoint in Matter then? There is no other mechanism. In principle it shouldn't matter if the target endpoint is local. It should be just like sending a packet to a host:port. If the host is the localhost, the packet simply gets routed to the local port without going through the network.

Apollon77 commented 1 month ago

I honestly do not know how UDP unicast works on all the platforms and cases with different network interfaces, potentially Thread or such, it just "feels" like overhead to me. Sure if it is "that easy" in all relevant cases network wise then having an own standard CASE session (if you get the issues solved that Boris mentioned), encryption and all these things happening for a local binding might be "simple" because no special local code would be needed - but also has some unneeded overhead in my eyes. And with the topics Boris pointed out there are also several other changes needed.

In fact the code on the local node knows all endpoints, all clusters, all everything, so in fact all actions can in my opinion be also handled by local code in the case of a local binding. But yes that would require two "code chains" - one for remote and one for local bindings and an abstraction level that needs to be developed and maintained. That effort could be the reason to not do it that way (maybe) ... or not ... I think that's also a topic to consider.

So the above is more my personal opinion

andrew-lifx commented 2 weeks ago

Espressif have a patch to fix local bindings, along the lines of what I recommended. It is short and simple. The callbacks require a small amount special handling for local bindings, but that is to be expected.