zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.48k stars 6.41k forks source link

mcumgr endless loop in mgmt_find_handler #44871

Closed maximevince closed 2 years ago

maximevince commented 2 years ago

Describe the bug mcumgr goes into an endless loop, looking for mgmt_group with id 63.

To Reproduce Steps to reproduce the behavior:

  1. cd zephyr/samples/subsys/mgmt/mcumgr/smp_svr
  2. west build --board=nrf5340dk_nrf5340_cpuapp_ns -p always -- -DOVERLAY_CONFIG=overlay-bt.conf
  3. west flash
  4. Start FOTA using nRF Connect app (upload build/zephyr/app_update.bin)
  5. See board hanging at the end of the FOTA (100% uploaded)
  6. west attach, and bt in gdb.

Expected behavior Expected is that the board reboots into MCUboot and applies the firmware upgrade / image swap.

Impact FOTA not working using nRF connect app. When using mcumgr from the PC, it works. It seems that nRF Connect app issues a custom zephyr command (mgmt_group 63), which mcumgr does not. That custom command is not available on the default sample build, and the

Analysis I suspect there's an issue in mgmt_find_group, which keeps looping though the circular linked list mgmt_group_list. It's looking for (command_id=0, group_id=63), but never finds it. Instead of breaking out of the loop, it keeps going round and round.

Logs and console output gdb backtrace:

#0  mgmt_find_group (command_id=0, group_id=63) at [..]/zephyr/subsys/mgmt/mcumgr/lib/mgmt/src/mgmt.c:101
#1  mgmt_find_handler (group_id=<optimized out>, command_id=0) at [..]/zephyr/subsys/mgmt/mcumgr/lib/mgmt/src/mgmt.c:134
#2  0x00036024 in smp_handle_single_payload (handler_found=<synthetic pointer>, req_hdr=0x200108e4 <sys_work_q_stack+2044>, cbuf=0x20010904 <sys_work_q_stack+2076>) at [..]/zephyr/subsys/mgmt/mcumgr/lib/smp/src/smp.c:147
#3  smp_handle_single_req (handler_found=<synthetic pointer>, req_hdr=0x200108e4 <sys_work_q_stack+2044>, streamer=0x2001095c <sys_work_q_stack+2164>) at [..]/zephyr/subsys/mgmt/mcumgr/lib/smp/src/smp.c:226
#4  smp_process_request_packet (streamer=streamer@entry=0x2001095c <sys_work_q_stack+2164>, req=0x2001290c <_net_buf_pkt_pool>) at [..]/zephyr/subsys/mgmt/mcumgr/lib/smp/src/smp.c:336
#5  0x00028018 in zephyr_smp_process_packet (nb=<optimized out>, zst=0x2000bdcc <smp_bt_transport>) at [..]/zephyr/subsys/mgmt/mcumgr/smp.c:249
#6  zephyr_smp_handle_reqs (work=0x2000bdcc <smp_bt_transport>) at [..]/zephyr/subsys/mgmt/mcumgr/smp.c:265
#7  0x0002f764 in work_queue_main (workq_ptr=0x2000b500 <k_sys_work_q>, p2=<optimized out>, p3=<optimized out>) at [..]/zephyr/kernel/work.c:649
#8  0x000308b8 in z_thread_entry (entry=0x2f6e1 <work_queue_main>, p1=<optimized out>, p2=<optimized out>, p3=<optimized out>) at [..]/zephyr/lib/os/thread_entry.c:36
#9  0xaaaaaaaa in ?? ()

Environment (please complete the following information):

maximevince commented 2 years ago

Oh, and defining these config options will work around the issue (the looping bug is still there, obviously):

CONFIG_MCUMGR_GRP_ZEPHYR_BASIC=y
CONFIG_MCUMGR_GRP_BASIC_CMD_STORAGE_ERASE=y
de-nordic commented 2 years ago

Oh, and defining these config options will work around the issue (the looping bug is still there, obviously):

CONFIG_MCUMGR_GRP_ZEPHYR_BASIC=y
CONFIG_MCUMGR_GRP_BASIC_CMD_STORAGE_ERASE=y

Will look into the bug. The options you are mentioning here are required for NRF53 because the app tries to erase storage area, while doing the update. Nevertheless if they are not enabled, the mcumgr should respond with 'not-supported' error code instead of breaking.

maximevince commented 2 years ago

Oh, and defining these config options will work around the issue (the looping bug is still there, obviously):

CONFIG_MCUMGR_GRP_ZEPHYR_BASIC=y
CONFIG_MCUMGR_GRP_BASIC_CMD_STORAGE_ERASE=y

Will look into the bug. The options you are mentioning here are required for NRF53 because the app tries to erase storage area, while doing the update. Nevertheless if they are not enabled, the mcumgr should respond with 'not-supported' error code instead of breaking.

okay, good to know. Isn't there a way to make sure these options are enabled by default on the nRF53 targets?

de-nordic commented 2 years ago

That is not default for nrf53 targets, because that was needed by some sample apps only This is probably connected to https://github.com/zephyrproject-rtos/zephyr/issues/43858

de-nordic commented 2 years ago

@maximevince Does the issue persist with current Zephyr revision?

de-nordic commented 2 years ago

The issue with the loop has been fixed with https://github.com/zephyrproject-rtos/zephyr/pull/43868, closing. @maximevince Please re-open if you do not agree.