zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.58k forks source link

Crash/segfault on all nrf5340 bsim simulations during IPC init #78099

Closed aescolar closed 1 month ago

aescolar commented 1 month ago

Describe the bug Most simulations are crashing for the nrf5340_bsim/nrf5340/cpuapp target during IPC init due to a segfault

To Reproduce Steps to reproduce the behavior:

  1. Ensure you can build the bsim targets: https://docs.zephyrproject.org/latest/boards/native/nrf_bsim/doc/nrf52_bsim.html#building-and-running
  2. mkdir build; cd build
  3. cmake -GNinja -DBOARD=nrf5340bsim/nrf5340/cpuapp -DAPP_DIR=../samples/bluetooth/peripheral_hr/ ../share/sysbuild
  4. ninja
  5. valgrind ./zephyr/zephyr.exe -nosim
  6. See error

Expected behavior No crashes

Impact ~main CI failing for all BT development~ (hotfix to disable it merged)

Logs and console output https://discord.com/channels/720317445772017664/1014241011989487716/1281588356676849795

==3262026==    at 0x8054006: pbuf_init (pbuf.c:68)
==3262026==    by 0x8053C4D: icmsg_open (icmsg.c:273)
==3262026==    by 0x80531BD: open (ipc_icbmsg.c:969)
==3262026==    by 0x80512F3: ipc_service_open_instance (ipc_service.c:38)
==3262026==    by 0x807DA95: bt_ipc_open (ipc.c:323)
==3262026==    by 0x805B69D: bt_hci_open (bluetooth.h:113)
==3262026==    by 0x8061621: bt_enable (hci_core.c:4396)
==3262026==    by 0x804B46A: main (main.c:193)
==3262026==  Address 0x20070004 is not stack'd, malloc'd or (recently) free'd

Environment (please complete the following information):

Additional context Introduced by 518de763a6f096ed5e217d9dd0688a11a099407a

aescolar commented 1 month ago

CC @dchat-nordic

aescolar commented 1 month ago

CC @dchat-nordic @Thalley @cvinayak @kruithofa given the amount of macrology around this initialization, It is looking to me that reverting this commit first and then sending calmly a fix is the best way forwards. So far it looks like a fix for this is hardly trivial, so I don't like hurrying it as a hotfix.

aescolar commented 1 month ago

Background: Unfortunately changes in the common nordic DT files are not triggering bsim tests today which resulted in this break not being caught.

The reason for the break is that the icbmsg backend is not yet supported in the nrf5340bsim target. Its configuration initialization is not proper.

aescolar commented 1 month ago

CC @doki-nordic

Thalley commented 1 month ago

Thank you for investigating. I concur with reverting the commit

aescolar commented 1 month ago

Lowering priority and removing regression as hotfix got merged