zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.11k stars 6.21k forks source link

Bluetooth: Mesh: Access model recv #73759

Closed LingaoM closed 1 month ago

LingaoM commented 1 month ago

https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/transport.c#L1585 https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/transport.c#L1027

The Mesh protocol stack uses these static variables to cache messages, and then these messages are processed by the application layer. This does not seem to be a problem, because it seems that these messages are processed in BT RX, and the cooperative thread used by Zephyr can avoid competition arises.

But we ignored two points: Mesh loopback messages are executed through the context of syswork, and the processing of messages into model->recv does not guarantee that the current task is always in the running state.

Consider the following situation: A certain message come from BT RX is processed at the application layer, but due to the execution of certain Block APIs, perhaps sem lock, perhaps k_sleep, or flash operation(https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/controller/flash/soc_flash_nrf_ticker.c#L225) etc., this will cause BT RX to temporarily lose the opportunity to run. At this time, the message from a loopback is processed in syswork , at this time static buf is accessed by two different tasks at the same time.

alxelax commented 1 month ago

Hi @LingaoM, this is prohibited to use blocking API from interrupt handlers or kernel services (including syswork). Blocking API will stop an ongoing thread until something is not received or timeout expired.

Considering that blocking API is prohibited to call from kernel services, BT Rx thread cannot preempt system work handler in the middle of execution since mesh is running in cooperative scheduling. I do not see the problem here.

Probably, I do not understand the issue to full extent. Could you provide more detailed explanation if you still think this is an issue?