Open rojer opened 1 year ago
@rojer I have enabled the static assert in https://github.com/project-chip/connectedhomeip/pull/28798 and skipped it for ASR and bouffalolab platforms. Please take a look.
Here is one more PR from @tx2rx for ASR: https://github.com/project-chip/connectedhomeip/pull/28824
@shubhamdp that looks reasonable. if you need help testing non-core-locking case, i'd be happy to.
@shubhamdp i took a stab at it and https://github.com/project-chip/connectedhomeip/pull/29057 is the result. this is pretty straightforward and seems to be working for me so far.
A note on LwIPReceiveUDPMessage
: it will almost certainly be invoked in the context of a different thread/task than CHIP's: with LWIP_TCPIP_CORE_LOCKING=0
it will be the TCPIP thread for sure, with LWIP_TCPIP_CORE_LOCKING=1
we don't really know, some other thread handling the interface RX (having a dedicated task for that is a common pattern this is what ESP32 does too, with separate tasks for wifi and eth). Granted, the method doesn't do much before posting the packet to the event queue, but even "not much" is something that could be racy, so I added Lock/UnlockChipStack()
calls.
Correction: turns out, i cannot use PlatformManager from Inet due to a dependency cycle. So we should be locking the stack but looks like at present we can't. I'll need guidance on how to break this cycle...
Reproduction steps
LwIP offers two modes to run TCP/IP stack: a dedicated task or in-line. In the former case, access to LwIP's state is serialized by requiring that all APIs be invoked from the dedicated task (throwing a closure on the task's work queue if necessary). In the latter, a mutex is used and it is sufficient to
LOCK_TCPIP_CORE();
.tcpip_api_call()
function is provided by LwIP and will do the right thing, depending on whetherLWIP_TCPIP_CORE_LOCKING
is set or not. There are advantages and disadvantages to both approaches but the point is, both are supported by the stack and either one can be used by a particular platform. However, CHIP assumes thatLWIP_TCPIP_CORE_LOCKING
is used while not explicitly requiring it, which results in incorrect locking model being used ifLWIP_TCPIP_CORE_LOCKING
is not set (basically no serialization), and exposes applications to races within networking stack with unpredictable results.Notably, ESP32 runs LwIP in dedicated task mode by default (
LWIP_TCPIP_CORE_LOCKING
is a config option but is not the default).CHIP should wither assert that
LWIP_TCPIP_CORE_LOCKING
is enabled or (preferably) wrap LwIP API calls intcpip_api_call
.Bug prevalence
Always
GitHub hash of the SDK that was being used
4f658f62aeced8b413f0f3bf9f5117897234a120
Platform
esp32
Platform Version(s)
No response
Anything else?
https://github.com/project-chip/connectedhomeip/blob/732f797f9baad2619675536ae5c5fd4913ddb032/src/inet/UDPEndPointImplLwIP.cpp#L70-L71 this assumes core locking is enabled,
LOCK_TCPIP_CORE
is a no-op otherwise.tcpip_api_call should be used to synchronously invoke an API function, using locking or sync closure if necessary.