zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.91k stars 6.64k forks source link

NET_TC_THREAD_PREEMPTIVE=y results in race conditions in LwM2M engine #78989

Closed dsitelew-gcx closed 1 month ago

dsitelew-gcx commented 1 month ago

Describe the bug

Enabling the preemptive TX/RX threads introduces race conditions in the LwM2M engine.

Looking at the socket_loop implementation there is almost no synchronisation. For example, scheduling a message with lwm2m_send_cb results in a crash due to a race condition, e.g:

[01:00:52.833,679] <dbg> net_lwm2m_message_handling: reply 0x20019d50 handled and removed
[01:00:52.833,831] <err> os: ***** SECURE FAULT *****
[01:00:52.833,831] <err> os:   Address: 0x10
[01:00:52.833,862] <err> os:   Attribution unit violation
[01:00:52.833,862] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x2001d764
[01:00:52.833,892] <err> os: r3/a4:  0x00000000 r12/ip:  0x00004000 r14/lr:  0x00015e35
[01:00:52.833,892] <err> os:  xpsr:  0x21000000
[01:00:52.833,923] <err> os: Faulting instruction address (r15/pc): 0x00044c98
[01:00:52.833,953] <err> os: >>> ZEPHYR FATAL ERROR 41: Unknown error on CPU 0
[01:00:52.833,984] <err> os: Current thread: 0x20013448 (lwm2m-sock-recv)
[01:00:52.845,520] <err> fatal_error: Resetting system

In this case it's a NULL-pointer dereference here

To Reproduce

Sorry, no minimal reproducible code example. I think setting the NET_TC_THREAD_PREEMPTIVE to y and sending a lot of messages should suffice.

Expected behavior


Sorry if this is the wrong place to post this, I just wanted to warn others of a potential problem.

The configuration flag is of course marked as experimental, so there should be no expectation that everything will work as expected, but I think since it is clear that LwM2M will not work with this flag, turning it on should result in a build error until the LwM2M engine is made thread-safe.

github-actions[bot] commented 1 month ago

Hi @dsitelew-gcx! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. πŸ€–πŸ’™

dkalowsk commented 1 month ago

@dsitelew-gcx which platform is this on?

dsitelew-gcx commented 1 month ago

@dkalowsk sorry, forgot to mention it, it's an nRF9160 (Arm Cortex-M33).

Honestly, looking at the code, I thought it didn't matter what platform it was.

rlubos commented 1 month ago

I've reproduced the crash, the culprit turned out the be a preempted memset() during message deallocation. It should be fixed with https://github.com/zephyrproject-rtos/zephyr/pull/79847, with those fixes in place I was able to flood the server with LwM2M send messages w/o hitting the crash again (initially it crashed after a few seconds).