zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.58k forks source link

ESP32S3: Zephyr freezes during OTA Update over UDP and BLE #76325

Open epc-ake opened 3 months ago

epc-ake commented 3 months ago

Discussed in https://github.com/zephyrproject-rtos/zephyr/discussions/76302

Originally posted by **epc-ake** July 25, 2024 Has anyone managed to use `mcumgr` over `udp` on a `esp32s3`? I'm developing on an `esp32s3_devkitm` and want to enable OTA firmware updates using `mcumgr and mcuboot over a UDP` interface. To experiment with this, I modified the `prj.conf` file of the `samples/net/wifi example` to enable `mcuboot` and `mcumgr`. See the attached file for details. [prj.conf](https://github.com/user-attachments/files/16378817/prj.txt) After flashing the firmware and connecting to a Wi-Fi network, I can retrieve image information using the `go-app` and `AuTerm`. For example, with the `go-app`: ``` shell mcumgr --conntype udp --connstring=[x.x.0.60]:1337 image list Images: image=0 slot=0 version: 0.0.0 bootable: true flags: active confirmed hash: 60e5eb52f59451a3db2ec9e978b13c0c8485577dd6787684e216069341bdf80b Split status: N/A (0) ``` However, when I try to upload an image, the firmware becomes unresponsive and freezes: ``` shell ./mcumgr --conntype udp --connstring=[x.x.0.60]:1337 image upload zephyr.signed.bin # starts freezing... ``` Output zephyr: ``` shell *** Booting Zephyr OS build 065fa94c79e5 *** [00:00:00.241,000] smp_udp: Started (IPv4) uart:~$ wifi connect -s ***** -p ***** -k 1 Connection requested Connected # requesting image info [00:00:17.353,000] net_dhcpv4: Received: x.x.0.60 [00:00:24.301,000] mcumgr_img_grp: img_mgmt_active_slot: (0) => 0 [00:00:24.302,000] mcuboot_util: Image index: 0, Swap type: none [00:00:24.302,000] mcumgr_img_grp: img_mgmt_get_next_boot_slot: (0, *) => slot = 0, type = 0 [00:00:24.302,000] mcumgr_img_grp: img_mgmt_active_slot: (0) => 0 uart:~$ # doing image update -> freezing... ``` I ran the debugger in parallel, and when I interrupted GDB (Ctrl+C) after the firmware started freezing, it pointed to _DoubleExceptionVector. Output GDB: ``` shell Info : [esp32s3.cpu0] Target halted, PC=0x403743C0, debug_reason=00000000 [esp32s3.cpu0] Target halted, PC=0x403743C0, debug_reason=00000000 Info : Set GDB target to 'esp32s3.cpu0' Set GDB target to 'esp32s3.cpu0' Info : [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000 [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000 Program received signal SIGINT, Interrupt. _DoubleExceptionVector () at /zephyrproject/zephyr-epc/arch/xtensa/core/xtensa_asm2_util.S:525 ``` ~~Uploading over `serial` works without any issues, so it seems to be specifically related to the UDP interface.~~ Does anyone have any ideas on how to debug this?
github-actions[bot] commented 3 months ago

Hi @epc-ake! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

epc-ake commented 3 months ago

if I set CONFIG_IMG_ERASE_PROGRESSIVELY=y it uploads at least some data before freezing again.

epc-ake commented 3 months ago

It also freezes during BLE update

epc-ake commented 3 months ago

Same issue over serial as well. However, it does manage to upload part of the firmware.

epc-ake commented 2 months ago

@sylvioalves can you give us an update on this? In our last conversation on discord you mentioned that you've evaluated a fix for this.

LeoBRIANDSmile commented 2 months ago

Hi, the CPU freezes during which step of OTA update (downloading firmware, erasing flash, writing flash, ...) ? Could you debug this. Because I have a similar issue on an homemade OTA update tool on esp32s3 during the flash erase step. CPU raises FATAL EXCEPTION.

epc-ake commented 2 months ago

I didn't work on this, so no progress here unfortunately. I think writing to the flash causes zephyr to freeze. So it might be a problem with the flash driver.

77452 mentions a similar/same bug.

LeoBRIANDSmile commented 2 months ago

image It seems to be an issue of flash protection

epc-ake commented 2 months ago

image It seems to be an issue of flash protection

This seems to be relevant to the esptool.py. Overall I was able to partially update the image...

rftafas commented 1 month ago

@epc-ake can you try https://github.com/zephyrproject-rtos/zephyr/pull/78121?

marekmatej commented 1 month ago

@epc-ake are you, by any chance, running the second image on the APPCPU/cpu1 ?

Info : [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000
[esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000
epc-ake commented 1 month ago

@epc-ake can you try #78121?

Thanks for this. I tried it and it seems to be working for at least UDP. However, I don't have the time right now to test it in depth... I will come back to it in 1-2 weeks.

epc-ake commented 1 month ago

@epc-ake are you, by any chance, running the second image on the APPCPU/cpu1 ?

Info : [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000
[esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000

Hmm interesting. I didn't notice that. Currently I'm only using the PROCPU. Maybe because I interrupted GDB with (Ctrl+C) it halted on cpu1 by accident?