zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.52k stars 6.45k forks source link

Image doesn't boot with MCUBoot #64297

Closed naNEQ closed 7 months ago

naNEQ commented 11 months ago

Describe the bug In simple words, my issue is that sometimes (see further in the text for more details), my image doesn't boot when it comes after MCUBoot (I've only tried with the Zephyr version of it). In these occasions, I get a Fatal Error right at the moment that the MCUBoot image tries to load and run my app image. I'm using a custom image, but on Zephyr version 3.4.0 I had a similar issue when using the in-tree sample application gsm_modem.

What I mean by "sometimes" is that whether an image can boot or not from the MCUBoot, is determined in build time. If an image can boot, it will always boot. If it can't, then it will never boot. Weirdly enough, I am usually able to "fix" an image that doesn't boot by removing some LOG_XXX prints or adding more things (once for example, I enabled the coredump library and that fixed the issue at that time).

It's important to note, that an image that fails to boot from MCUBoot, doesn't have any issue at all booting on its own, without MCUBoot.

Impact That's between an annoyance and showstopper.

Logs and console output This is one example of what I see in the console (core dump prints have been removed for better visibility). I also have put some extra debug prints to understand where did the error really occur.

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:2
load:0x3fcd0000,len:0x1b8c
load:0x403b6000,len:0x400
load:0x403b6400,len:0x50e8
load:0x403be000,len:0x1d2c
entry 0x403b8560
[esp32s3] [INF] MCUboot 2nd stage bootloader
[esp32s3] [INF] compiled at Oct 20 2023 00:23:37
[esp32s3] [INF] Chip revision: 0
[esp32s3] [INF] SPI Flash speed: 40MHz, mode: DIO, size: 4MB
I (48) soc: Will now call esp_intr_initialize()
I (53) soc: Will now call z_cstart()
[esp32s3] [INF] Image index: 0, Swap type: none
[esp32s3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32s3] [INF] Application start=4037E264h
[esp32s3] [INF] DRAM segment: paddr=0002930Ch, vaddr=3FCB4900h, size=034F0h ( 13552) load
[esp32s3] [INF] IRAM segment: paddr=00015D00h, vaddr=40374000h, size=1360Ch ( 79372) load
[esp32s3] [INF] DROM segment: paddr=00010040h, vaddr=3C000040h, size=15CC0h ( 89280) map
[esp32s3] [INF] IROM segment: paddr=00040000h, vaddr=42030000h, size=4368Eh (276110) map

I (407) spiram: Found 16MBit SPI RAM device
I (407) spiram: SPI RAM mode: sram 40m
I (407) spiram: PSRAM initialized, cache is in normal (1-core) mode.
I (818) spiram: SPI SRAM memory test OK
I (818) soc: Will now call esp_clk_init()
I (827) soc: Will now call esp_timer_early_init()
I (827) soc: Will now call spi_flash_guard_set()
I (827) soc: Will now call esp_intr_initialize()
I (829) soc: Will now call z_cstart()
[00:00:00.856,000] <err> os:  ** FATAL EXCEPTION
[00:00:00.856,000] <err> os:  ** CPU 0 EXCCAUSE 28 (load prohibited)
[00:00:00.856,000] <err> os:  **  PC 0x4204e963 VADDR 0x22c
[00:00:00.856,000] <err> os:  **  PS 0x60f20
[00:00:00.856,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:15 CALLINC:2)
[00:00:00.856,000] <err> os:  **  A0 0x8204ef50  SP 0x3fca90e0  A2 0x220  A3 0x3fca9135
[00:00:00.856,000] <err> os:  **  A4 0x1  A5 0x1  A6 (nil)  A7 (nil)
[00:00:00.856,000] <err> os:  **  A8 0x1  A9 0x3fc9ce48 A10 0x3fc9ce48 A11 (nil)
[00:00:00.856,000] <err> os:  ** A12 0x50020 A13 0x3fceb2e0 A14 0x3fca6a48 A15 (nil)
[00:00:00.856,000] <err> os:  ** LBEG 0x400570e8 LEND 0x400570f3 LCOUNT (nil)
[00:00:00.856,000] <err> os:  ** SAR 0x1a
<core dumps removed from here>
[00:00:04.254,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:04.262,000] <err> os: Current thread: 0x3fc9ce48 (unknown)
[00:00:04.268,000] <err> os: Halting system

I've managed to get a coredump from such a situation and this is what I got. I can't make sense of why this issue is happening, but perhaps there's someone that can do that. It's worth noting, that I got exactly the same coredump on two different occasions (still my custom app, but two different problematic builds).

(gdb) target remot localhost:1234
Remote debugging using localhost:1234
net_pkt_cursor_operate (pkt=0x220, data=0x3fca9175 <z_main_stack+9877>, length=1, copy=<optimized out>, write=false)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_pkt.c:1569
1569            while (c_op->buf && length) {
(gdb) info registers
pc             0x4204eaf3          0x4204eaf3 <net_pkt_cursor_operate+227>
lbeg           0x400570e8          1074098408
lend           0x400570f3          1074098419
lcount         0x0                 0
sar            0x1a                26
ps             0x60f20             397088
threadptr      <unavailable>
br             <unavailable>
scompare1      0x200               512
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
fcr            <unavailable>
fsr            <unavailable>
pwrctl         <unavailable>
pwrstat        <unavailable>
eristat        <unavailable>
cs_itctrl      <unavailable>
cs_claimset    <unavailable>
cs_claimclr    <unavailable>
cs_lockaccess  <unavailable>
cs_lockstatus  <unavailable>
cs_authstatus  <unavailable>
fault_info     <unavailable>
trax_id        <unavailable>
trax_control   <unavailable>
trax_status    <unavailable>
trax_data      <unavailable>
trax_address   <unavailable>
trax_pctrigger <unavailable>
trax_pcmatch   <unavailable>
trax_delay     <unavailable>
trax_memstart  <unavailable>
trax_memend    <unavailable>
pmg            <unavailable>
pmpc           <unavailable>
pm0            <unavailable>
pm1            <unavailable>
pmctrl0        <unavailable>
pmctrl1        <unavailable>
pmstat0        <unavailable>
pmstat1        <unavailable>
ocdid          <unavailable>
ocd_dcrclr     <unavailable>
ocd_dcrset     <unavailable>
ocd_dsr        <unavailable>
a0             0x8204f0e0          -2113605408
a1             0x3fca9120          1070240032
a2             0x220               544
a3             0x3fca9175          1070240117
a4             0x1                 1
a5             0x1                 1
a6             0x0                 0
a7             0x0                 0
a8             0x1                 1
a9             0x3fc9ce48          1070190152
a10            0x3fc9ce48          1070190152
a11            0x0                 0
a12            0x1                 1
a13            0x3fc9ce48          1070190152
a14            0x3fc9ce48          1070190152
a15            0x0                 0
(gdb)
(gdb) bt
#0  net_pkt_cursor_operate (pkt=0x220, data=0x3fca9175 <z_main_stack+9877>, length=1, copy=<optimized out>,
    write=false) at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_pkt.c:1569
#1  0x4204f0e0 in net_pkt_read (pkt=0x220, data=0x3fca9175 <z_main_stack+9877>, length=1)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_pkt.c:1642
#2  0x4204a93b in net_pkt_read_u8 (data=0x3fca9175 <z_main_stack+9877> "`E?"RE?°IÉ?Hs7\200 `E?"RE? \004\006",
    pkt=0x220) at C:/Users/NikosAgianniotis/hvac-iot/zephyr/include/zephyr/net/net_pkt.h:2024
#3  ppp_fsm_input (fsm=0x3fc9cca8 <_net_l2_data_ppp_net_dev_ppp_quectel_bg9500+272>, proto=32801, pkt=0x220)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/l2/ppp/fsm.c:1035
#4  0x4204b5b8 in ipcp_handle (ctx=0x3fc9cb98 <_net_l2_data_ppp_net_dev_ppp_quectel_bg9500>, iface=0x0, pkt=0x220)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/l2/ppp/ipcp.c:25
#5  0x4204a1e5 in net_ppp_init (iface=<optimized out>)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/l2/ppp/ppp_l2.c:423
#6  0x42048f7e in modem_ppp_ppp_api_init (iface=0x3fcb7a7c <__net_if_ppp_net_dev_ppp_quectel_bg950_0>)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/modem/modem_ppp.c:395
#7  0x4204cd2d in init_iface (iface=0x3fcb7a7c <__net_if_ppp_net_dev_ppp_quectel_bg950_0>)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_if.c:430
#8  net_if_init () at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_if.c:4860
#9  0x4204b97f in init_rx_queues () at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_core.c:196
#10 net_init () at C:/Users/NikosAgianniotis/hvac-iot/zephyr/subsys/net/ip/net_core.c:508
#11 0x403776ac in z_sys_init_run_level (level=<optimized out>)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/kernel/init.c:313
#12 0x403776c5 in bg_thread_main (unused1=0x0, unused2=0x0, unused3=0x0)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/kernel/init.c:343
#13 0x4203e360 in z_thread_entry (entry=0x403776b4 <bg_thread_main>, p1=0x0, p2=0x0, p3=0x0)
    at C:/Users/NikosAgianniotis/hvac-iot/zephyr/lib/os/thread_entry.c:48

Environment (please complete the following information):

Additional context Not sure what else can be relevant, but I'm happy to provide more information if necessary.

sylvioalves commented 11 months ago

@naNEQ, Can you enable the below so we can check a bit more about stack corruption?

CONFIG_STACK_SENTINEL=y
CONFIG_STACK_CANARIES=y
naNEQ commented 11 months ago

@naNEQ, Can you enable the below so we can check a bit more about stack corruption?

CONFIG_STACK_SENTINEL=y
CONFIG_STACK_CANARIES=y

Sure, and then print again core dump?

sylvioalves commented 10 months ago

@naNEQ, yes, please. Unless the output is the same. Just to check whether the crash is not related to stack issues.

naNEQ commented 10 months ago

@sylvioalves I did try it and I see the same output (just with different address numbers). But then having it enabled, an image that boots ends up failing with spinlock failures. It's difficult for me to say whether this is really related to the initial problem or is just something else that was revealed through the process.

*** Booting Zephyr OS build v3.5.0-rc3 ***
[00:00:00.979,000] <inf> modem_module: Board 'neq_hvaciot_esp32s3' APN 'iot.1nce.net' UART 'uart@60010000' device 0x3c015b38 (gsm_modem)
[00:00:00.979,000] <inf> sensor_module: All GPIOs related to the Multiplexers have been initialized successfully
[00:00:00.979,000] <inf> drv_ads1015: I2C device "i2c@60013000" found
[00:00:01.125,000] <inf> fota_download: Image is confirmed OK
[00:00:01.125,000] <inf> azure_fota: Current firmware version: 0.5.0
[00:00:01.125,000] <err> os:  ** A12 0x7271641 A13 0xfd58b601 A14 0x60223 A15 0x60a2
[00:00:01.133,000] <err> os:  ** A12 0x3fcabb10 A13 0x4 A14 0x92d04700 A15 0x3fcabb00
[00:00:01.143,000] <err> os:  ** A12 0x3fcab878 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.150,000] <err> os:  ** A12 0x3fcab5e0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.160,000] <err> os:  ** A12 0x3fcab348 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.169,000] <err> os:  ** A12 0x3fcab0b0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.175,000] <err> os:  ** A12 0x3fcaae18 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.183,000] <err> os:  ** A12 0x3fcaab80 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.194,000] <err> os:  ** A12 0x3fcaa8e8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.205,000] <err> os:  ** A12 0x3fcaa650 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.214,000] <err> os:  ** A12 0x3fcaa3b8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.222,000] <err> os:  ** A12 0x3fcaa120 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.231,000] <err> os:  ** A12 0x3fca9e88 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.241,000] <err> os:  ** A12 0x3fca9bf0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.251,000] <err> os:  ** A12 0x3fca9958 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.260,000] <err> os:  ** A12 0x3fca96c0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.268,000] <err> os:  ** A12 0x3fca9428 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.277,000] <err> os:  ** A12 0x3fca9190 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.285,000] <err> os:  ** A12 0x3fca8ef8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.294,000] <err> os:  ** A12 0x3fca8c60 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.302,000] <err> os:  ** A12 0x3fca89c8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.311,000] <err> os:  ** A12 0x3fca8730 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.319,000] <err> os:  ** A12 0x3fca8498 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.328,000] <err> os:  ** A12 0x3fca8200 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.336,000] <err> os:  ** A12 0x3fca7f68 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.345,000] <err> os:  ** A12 0x3fca7cd0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.353,000] <err> os:  ** A12 0x3fca7a38 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.362,000] <err> os:  ** A12 0x3fca77a0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.370,000] <err> os:  ** A12 0x3fca7508 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.379,000] <err> os:  ** A12 0x3fca7270 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.387,000] <err> os:  ** A12 0x3fca6fd8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.396,000] <err> os:  ** A12 0x3fca6d40 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.404,000] <err> os:  ** A12 0x3fca6aa8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.413,000] <err> os:  ** A12 0x3fca6810 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.421,000] <err> os:  ** A12 0x3fca6578 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.430,000] <err> os:  ** A12 0x3fca62e0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.438,000] <err> os:  ** A12 0x3fca6048 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.447,000] <err> os:  ** A12 0x3fca5db0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.456,000] <err> os:  ** A12 0x3fca5b18 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.464,000] <err> os:  ** A12 0x3fca5880 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.473,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.481,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.489,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.507,000] <err> os:  **  A0 0x820405ac  SP 0x3fca4b98  A2 0x4  A3 0x3fca4bb8
--- 533 messages dropped ---
[00:00:01.507,000] <err> os:  **  A4 0x3fca4b88  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.507,000] <err> os:  **  A8 0x92d04700  A9 0x3fca4af8 A10 0x3c00bb8a A11 0x3fca4bb8
[00:00:01.507,000] <err> os:  ** A12 0x3fca4b88 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.507,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.507,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.515,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.515,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.515,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.515,000] <err> os:  **  PS 0x60d20
[00:00:01.515,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:13 CALLINC:2)
[00:00:01.515,000] <err> os:  **  A0 0x820405ac  SP 0x3fca4900  A2 0x4  A3 0x3fca4920
[00:00:01.515,000] <err> os:  **  A4 0x3fca48f0  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.515,000] <err> os:  **  A8 0x92d04700  A9 0x3fca4860 A10 0x3c00bb8a A11 0x3fca4920
[00:00:01.515,000] <err> os:  ** A12 0x3fca48f0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.515,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.515,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.523,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.523,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.524,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.524,000] <err> os:  **  PS 0x60320
[00:00:01.524,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:3 CALLINC:2)
[00:00:01.524,000] <err> os:  **  A0 0x820405ac  SP 0x3fca4668  A2 0x4  A3 0x3fca4688
[00:00:01.524,000] <err> os:  **  A4 0x3fca4658  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.524,000] <err> os:  **  A8 0x92d04700  A9 0x3fca45c8 A10 0x3c00bb8a A11 0x3fca4688
[00:00:01.524,000] <err> os:  ** A12 0x3fca4658 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.524,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.524,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.532,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.532,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.532,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.532,000] <err> os:  **  PS 0x60920
[00:00:01.532,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:9 CALLINC:2)
[00:00:01.532,000] <err> os:  **  A0 0x820405ac  SP 0x3fca43d0  A2 0x4  A3 0x3fca43f0
[00:00:01.532,000] <err> os:  **  A4 0x3fca43c0  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.532,000] <err> os:  **  A8 0x92d04700  A9 0x3fca4330 A10 0x3c00bb8a A11 0x3fca43f0
[00:00:01.532,000] <err> os:  ** A12 0x3fca43c0 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.532,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.532,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.540,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.541,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.541,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.541,000] <err> os:  **  PS 0x60f20
[00:00:01.541,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:15 CALLINC:2)
[00:00:01.541,000] <err> os:  **  A0 0x820405ac  SP 0x3fca4138  A2 0x4  A3 0x3fca4158
[00:00:01.541,000] <err> os:  **  A4 0x3fca4128  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.541,000] <err> os:  **  A8 0x92d04700  A9 0x3fca4098 A10 0x3c00bb8a A11 0x3fca4158
[00:00:01.541,000] <err> os:  ** A12 0x3fca4128 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.541,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.541,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.549,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.549,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.549,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.549,000] <err> os:  **  PS 0x60520
[00:00:01.549,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:5 CALLINC:2)
[00:00:01.549,000] <err> os:  **  A0 0x820405ac  SP 0x3fca3ea0  A2 0x4  A3 0x3fca3ec0
[00:00:01.549,000] <err> os:  **  A4 0x3fca3e90  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.549,000] <err> os:  **  A8 0x92d04700  A9 0x3fca3e00 A10 0x3c00bb8a A11 0x3fca3ec0
[00:00:01.549,000] <err> os:  ** A12 0x3fca3e90 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.549,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.549,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.558,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.558,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.558,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.558,000] <err> os:  **  PS 0x60b20
[00:00:01.558,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:11 CALLINC:2)
[00:00:01.558,000] <err> os:  **  A0 0x820405ac  SP 0x3fca3c08  A2 0x4  A3 0x3fca3c28
[00:00:01.558,000] <err> os:  **  A4 0x3fca3bf8  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.558,000] <err> os:  **  A8 0x92d04700  A9 0x3fca3b68 A10 0x3c00bb8a A11 0x3fca3c28
[00:00:01.558,000] <err> os:  ** A12 0x3fca3bf8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.558,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.558,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.566,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.566,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.566,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.566,000] <err> os:  **  PS 0x60120
[00:00:01.566,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:1 CALLINC:2)
[00:00:01.566,000] <err> os:  **  A0 0x820405ac  SP 0x3fca3970  A2 0x4  A3 0x3fca3990
[00:00:01.566,000] <err> os:  **  A4 0x3fca3960  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.566,000] <err> os:  **  A8 0x92d04700  A9 0x3fca38d0 A10 0x3c00bb8a A11 0x3fca3990
[00:00:01.566,000] <err> os:  ** A12 0x3fca3960 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.566,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.566,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.575,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.575,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.575,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.575,000] <err> os:  **  PS 0x60720
[00:00:01.575,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:01.575,000] <err> os:  **  A0 0x820405ac  SP 0x3fca36d8  A2 0x4  A3 0x3fca36f8
[00:00:01.575,000] <err> os:  **  A4 0x3fca36c8  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.575,000] <err> os:  **  A8 0x92d04700  A9 0x3fca3638 A10 0x3c00bb8a A11 0x3fca36f8
[00:00:01.575,000] <err> os:  ** A12 0x3fca36c8 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.575,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.575,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.583,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.583,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.583,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.583,000] <err> os:  **  PS 0x60d20
[00:00:01.583,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:13 CALLINC:2)
[00:00:01.583,000] <err> os:  **  A0 0x820405ac  SP 0x3fca3440  A2 0x4  A3 0x3fca3460
[00:00:01.583,000] <err> os:  **  A4 0x3fca3430  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.583,000] <err> os:  **  A8 0x92d04700  A9 0x3fca33a0 A10 0x3c00bb8a A11 0x3fca3460
[00:00:01.583,000] <err> os:  ** A12 0x3fca3430 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.583,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.583,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.592,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.592,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.592,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.592,000] <err> os:  **  PS 0x60320
[00:00:01.592,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:3 CALLINC:2)
[00:00:01.592,000] <err> os:  **  A0 0x820405ac  SP 0x3fca31a8  A2 0x4  A3 0x3fca31c8
[00:00:01.592,000] <err> os:  **  A4 0x3fca3198  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.592,000] <err> os:  **  A8 0x92d04700  A9 0x3fca3108 A10 0x3c00bb8a A11 0x3fca31c8
[00:00:01.592,000] <err> os:  ** A12 0x3fca3198 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.592,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.592,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.600,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.600,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.600,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
[00:00:01.600,000] <err> os:  **  PS 0x60920
[00:00:01.600,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:9 CALLINC:2)
[00:00:01.600,000] <err> os:  **  A0 0x820405ac  SP 0x3fca2f10  A2 0x4  A3 0x3fca2f30
[00:00:01.600,000] <err> os:  **  A4 0x3fca2f00  A5 0x4  A6 0x92d04700  A7 0x92d04700
[00:00:01.600,000] <err> os:  **  A8 0x92d04700  A9 0x3fca2e70 A10 0x3c00bb8a A11 0x3fca2f30
[00:00:01.600,000] <err> os:  ** A12 0x3fca2f00 A13 0x4 A14 0x92d04700 A15 0x92d04700
[00:00:01.600,000] <err> os:  ** LBEG 0x400556d5 LEND 0x400556e5 LCOUNT 0xffffffff
[00:00:01.600,000] <err> os:  ** SAR 0x4
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
        Invalid spinlock 0x3fca14e4
[00:00:01.608,000] <err> os:  ** FATAL EXCEPTION
[00:00:01.608,000] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[00:00:01.608,000] <err> os:  **  PC 0x40376fdf VADDR (nil)
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
ASSERTION FAIL [z_spin_lock_valid(l)] @ WEST_TOPDIR/zephyr/include/zephyr/spinlock.h:114
naNEQ commented 10 months ago

I enabled additional debug prints and here is what I see.

First a failed-to-boot image:

[esp32s3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32s3] [INF] Application start=4037DF4Ch
[esp32s3] [INF] DRAM segment: paddr=0002A4BCh, vaddr=3FCB6E70h, size=03288h ( 12936) load
[esp32s3] [INF] IRAM segment: paddr=000171C8h, vaddr=40374000h, size=132F4h ( 78580) load
[esp32s3] [INF] DROM segment: paddr=00010040h, vaddr=3C000040h, size=17190h ( 94608) map
[esp32s3] [INF] IROM segment: paddr=00040000h, vaddr=42030000h, size=45AF2h (285426) map

I (466) spiram: Found 16MBit SPI RAM device
I (466) spiram: SPI RAM mode: sram 40m
I (466) spiram: PSRAM initialized, cache is in normal (1-core) mode.
I (877) spiram: SPI SRAM memory test OK
I (877) soc: Will now call esp_clk_init()
I (885) soc: Will now call esp_timer_early_init()
I (885) soc: Will now call spi_flash_guard_set()
I (885) soc: Will now call esp_intr_initialize()
I (887) soc: Will now call z_cstart()
[00:00:00.914,000] <dbg> net_pkt: net_pkt_init: (0x3fc9cde8): Allocating 8 RX (512 bytes), 8 TX (512 bytes), 0 RX data (0 bytes) and 0 TX data (0 bytes) buffers
[00:00:00.914,000] <dbg> net_mgmt: net_mgmt_event_init: (0x3fc9cde8): Net MGMT initialized: queue of 5 entries, stack size of 768
[00:00:00.914,000] <dbg> net_if: net_if_init: (0x3fc9cde8):
[00:00:00.914,000] <dbg> net_if: init_iface: (0x3fc9cde8): On iface 0x3fcb9d54
[00:00:00.914,000] <dbg> net_l2_ppp: net_ppp_init: (0x3fc9cde8): Initializing PPP L2 0x3fc9cb48 for iface 0x3fcb9d54
[00:00:00.914,000] <dbg> net_mgmt: net_mgmt_add_event_callback: (0x3fc9cde8): Adding event callback 0x3fc9cd3c
[00:00:00.914,000] <dbg> net_pkt: net_pkt_read: (0x3fc9cde8): pkt 0x3fcab6a0 data 0x3fcab68c length 1
[00:00:00.915,000] <dbg> net_pkt: net_pkt_read: (0x3fc9cde8): pkt 0x3fcab6a0 data 0x3fcab68b length 1
[00:00:00.915,000] <dbg> net_pkt: net_pkt_read: (0x3fc9cde8): pkt 0x3fcab6a0 data 0x3fcab640 length 2
[00:00:00.915,000] <err> os:  ** FATAL EXCEPTION
...
[00:00:01.089,000] <err> os: Halting system

A second failed-to-boot image (notice how the packet address is completely off pkt 0x2100):

[esp32s3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32s3] [INF] Application start=4037DF4Ch
[esp32s3] [INF] DRAM segment: paddr=0002A4D4h, vaddr=3FCB6E70h, size=03288h ( 12936) load
[esp32s3] [INF] IRAM segment: paddr=000171E0h, vaddr=40374000h, size=132F4h ( 78580) load
[esp32s3] [INF] DROM segment: paddr=00010040h, vaddr=3C000040h, size=171A0h ( 94624) map
[esp32s3] [INF] IROM segment: paddr=00040000h, vaddr=42030000h, size=45B32h (285490) map

I (466) spiram: Found 16MBit SPI RAM device
I (466) spiram: SPI RAM mode: sram 40m
I (466) spiram: PSRAM initialized, cache is in normal (1-core) mode.
I (877) spiram: SPI SRAM memory test OK
I (877) soc: Will now call esp_clk_init()
I (886) soc: Will now call esp_timer_early_init()
I (886) soc: Will now call spi_flash_guard_set()
I (886) soc: Will now call esp_intr_initialize()
I (887) soc: Will now call z_cstart()
[00:00:00.915,000] <dbg> net_pkt: net_pkt_init: (0x3fc9cde8): Allocating 8 RX (512 bytes), 8 TX (512 bytes), 0 RX data (0 bytes) and 0 TX data (0 bytes) buffers
[00:00:00.915,000] <dbg> net_mgmt: net_mgmt_event_init: (0x3fc9cde8): Net MGMT initialized: queue of 5 entries, stack size of 768
[00:00:00.915,000] <dbg> net_if: net_if_init: (0x3fc9cde8):
[00:00:00.915,000] <dbg> net_if: init_iface: (0x3fc9cde8): On iface 0x3fcb9d54
[00:00:00.915,000] <dbg> net_l2_ppp: net_ppp_init: (0x3fc9cde8): Initializing PPP L2 0x3fc9cb48 for iface 0x3fcb9d54
[00:00:00.915,000] <dbg> net_mgmt: net_mgmt_add_event_callback: (0x3fc9cde8): Adding event callback 0x3fc9cd3c
[00:00:00.915,000] <dbg> net_pkt: net_pkt_read: (0x3fc9cde8): pkt 0x2100 data 0x3fcab67c length 1
<err> os: **FATAL EXCEPTION
...

Lastly an image that boots.

[esp32s3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32s3] [INF] Application start=4037DF4Ch
[esp32s3] [INF] DRAM segment: paddr=0002A4A8h, vaddr=3FCB6E70h, size=03288h ( 12936) load
[esp32s3] [INF] IRAM segment: paddr=000171B4h, vaddr=40374000h, size=132F4h ( 78580) load
[esp32s3] [INF] DROM segment: paddr=00010040h, vaddr=3C000040h, size=17180h ( 94592) map
[esp32s3] [INF] IROM segment: paddr=00040000h, vaddr=42030000h, size=45ABAh (285370) map

I (466) spiram: Found 16MBit SPI RAM device
I (466) spiram: SPI RAM mode: sram 40m
I (466) spiram: PSRAM initialized, cache is in normal (1-core) mode.
I (877) spiram: SPI SRAM memory test OK
I (877) soc: Will now call esp_clk_init()
I (885) soc: Will now call esp_timer_early_init()
I (885) soc: Will now call spi_flash_guard_set()
I (885) soc: Will now call esp_intr_initialize()
I (887) soc: Will now call z_cstart()
[00:00:00.914,000] <dbg> net_pkt: net_pkt_init: (0x3fc9cde8): Allocating 8 RX (512 bytes), 8 TX (512 bytes), 0 RX data (0 bytes) and 0 TX data (0 bytes) buffers
[00:00:00.914,000] <dbg> net_mgmt: net_mgmt_event_init: (0x3fc9cde8): Net MGMT initialized: queue of 5 entries, stack size of 768
[00:00:00.914,000] <dbg> net_if: net_if_init: (0x3fc9cde8):
[00:00:00.914,000] <dbg> net_if: init_iface: (0x3fc9cde8): On iface 0x3fcb9d54
[00:00:00.914,000] <dbg> net_l2_ppp: net_ppp_init: (0x3fc9cde8): Initializing PPP L2 0x3fc9cb48 for iface 0x3fcb9d54
[00:00:00.914,000] <dbg> net_mgmt: net_mgmt_add_event_callback: (0x3fc9cde8): Adding event callback 0x3fc9cd3c
[00:00:00.915,000] <dbg> net_l2_ppp: ipcp_init: (0x3fc9cde8): proto IPCP (0x8021) fsm 0x3fc9cc58
[00:00:00.915,000] <dbg> net_l2_ppp: lcp_init: (0x3fc9cde8): proto LCP (0xc021) fsm 0x3fc9cb80
[00:00:00.915,000] <dbg> net_if: update_operational_state: (0x3fc9cde8): iface 0x3fcb9d54, oper state DOWN admin DOWN carrier OFF dormant OFF
[00:00:00.915,000] <dbg> net_if: net_if_post_init: (0x3fc9cde8):
[00:00:00.915,000] <dbg> net_ctx: net_context_bind: (0x3fc9cde8): Context 0x3fc9d34c binding to UDP 0.0.0.0:54220 iface 1 (0x3fcb9d54)
*** Booting Zephyr OS build v3.5.0-rc3 ***

What puzzles me is that this time the process is a bit different. We don't see prints from the net_pkt. Execution goes directly to ipcp_init() and lcp_init(). It's as if the STRUCT_SECTION_FOREACH() gives different results?

STRUCT_SECTION_FOREACH(ppp_protocol_handler, proto) {
    if (proto->protocol == PPP_LCP) {
        ppp_lcp = proto;
    }

    proto->init(ctx);
    count++;
}

Or another theory: Perhaps while we are at the initialization, net packets come in (for some reason) and that's why we end up doing net_pkt_read(), which for some reason fails at parsing them, whereas when it boots, these packets don't come in?

sam131208 commented 10 months ago
I'm having the same issue. Sometimes adding a LOG will not boot to zephyr os. Sometimes, after modifying the code, you can't boot to zephyr os, delete or add LOG, and you can boot to zephyr os normally.
naNEQ commented 10 months ago
I'm having the same issue. Sometimes adding a LOG will not boot to zephyr os. Sometimes, after modifying the code, you can't boot to zephyr os, delete or add LOG, and you can boot to zephyr os normally.

@sam131208 Very happy that someone else also has the same issue.

Are you also using some sort of network connectivity? Perhaps ppp, like me? Although the specific protocol most probably doesn't matter.

sam131208 commented 10 months ago

I only used bluetooth, however, I don't think it has anything to do with it. I've compiled the same code on different computers (samplesbluetoothperipheral_hids) and I see in two zephyr.map files that .debug_str and .debug_line have different sizes(https://github.com/zephyrproject-rtos/zephyr/issues/64952). However, a comparison of the two zephyr.maps shows that the address of the program is the same. It is more likely that the debug message address overlaps with the program code, causing the program to halt.

github-actions[bot] commented 8 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

naNEQ commented 6 months ago

Just wanted to say that this issue can be solved with the changes proposed in #68028