project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.49k stars 2.01k forks source link

[BUG] `tv-casting-app` does not take the SDK lock prior to SDK interactions #25023

Open domichae-amazon opened 1 year ago

domichae-amazon commented 1 year ago

Reproduction steps

Problem

The darwin version of the tv-casting-app doesn't take the SDK lock prior to interacting with the SDK. When the tv-casting-app is backgrounded, it attempts to Shutdown the Server in a non-thread-safe manner. This eventually results in a crash if the ExchangeContext is in the middle of sending a message (SendMessage), when this happens due to a segmentation fault on accessing GetSessionManager()->SystemLayer(), where GetSessionManager() returns nullptr.

Impact

The Darwin version of the TV casting example crashes.

Bug prevalence

Sometimes

GitHub hash of the SDK that was being used

02bc67e245dbc0e85b3ab772c1f81eb27b953420

Platform

darwin

Platform Version(s)

N/A

Anything else?

ExchangeContext fault location

bzbarsky-apple commented 1 year ago

When an ExchangeContext is ShutDown() while the ExchangeContext is in the middle of sending a message (SendMessage)

I'm not sure what you mean by "ShutDown()" (there is no such API on ExchangeContext), but what exact thing nulled out mExchangeMgr and how did it get called in the middle of SendMessage?

Or is the real issue that SendMessage is being called after the exchange manager has been shut down? @domichae-amazon

domichae-amazon commented 1 year ago

That's correct - the issue is that the Exchange Manager was shut down, which in turn clears out some of the state of the Exchange Context which may still be alive.

bzbarsky-apple commented 1 year ago

the issue is that the Exchange Manager was shut down, which in turn clears out some of the state of the Exchange Context which may still be alive.

Why is the exchange context still alive? All exchanges should be closed before the exchange manager shuts down, precisely because of this problem. Who is not doing that, and why not?

What is the exact sequence of events that got us here?

domichae-amazon commented 1 year ago

We discussed this issue more over on the pull request. I did some more digging based on the information that @bzbarsky-apple shared and found the root cause: the darwin variant of the tv-casting-app example project does not take the Matter stack lock before any interactions with the Matter SDK. This doesn't match the expectations of the SDK and could theoretically result in crashes in a few different places. The specific event leading to a crash here is that the sample iOS app calls Server::GetInstance().Shutdown() in a non-thread-safe way (from the application's main DispatchQueue). I'll update the content of the bug to indicate the real issue and create a fresh pull request that addresses it.