zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
11.01k stars 6.7k forks source link

Kernel usage fault when using semaphore with multi-threading #41963

Closed RafaelLeeImg closed 2 years ago

RafaelLeeImg commented 2 years ago

Describe the bug The kernel enters usage fault condition when release semaphore in I2C driver

This project is a demo project to use LVGL with SSD1306 0.96 inch OLED screen and NUCLEO_F411RE board.

https://github.com/RafaelLeeImg/zephyr_lvgl_nucleo_f411re.git The gdbscript I use is here https://github.com/RafaelLeeImg/zephyr_lvgl_nucleo_f411re/blob/main/gdbscript

Please also mention any information which could help others to understand the problem you're facing:

To Reproduce Steps to reproduce the behavior:

  1. compile the project and burn it onto nucleo_f411re
  2. gdb-multiarch -x gdbscript
  3. Then the ARM will run to the line just before the fault
  4. Type these command under gdb
  5. c
  6. p/x *0xE0001004
  7. The MCU will stop at z_arm_usage_fault

Expected behavior The MCU will stuck at z_arm_usage_fault

Impact STM32 MCU will stuck with I2C device.

Logs and console output

0 0x08003724 in arch_irq_unlock (key=0x0) at west/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:95

1 arch_swap (key=0x0) at west/zephyr/arch/arm/core/aarch32/swap.c:44

2 0x08022fe6 in z_swap_irqlock (key=0x0) at west/zephyr/kernel/include/kswap.h:184

3 0x08023238 in z_swap (key=..., lock=0x200014b4 ) at west/zephyr/kernel/include/kswap.h:195

4 z_reschedule (lock=0x200014b4 , key=...) at west/zephyr/kernel/sched.c:874

5 0x0800f3c0 in z_impl_k_sem_give (sem=0x20001170 ) at west/zephyr/kernel/sem.c:103

6 0x08014720 in k_sem_give (sem=0x20001170 ) at /dev/shm/d/build/alonzo_lvgl/zephyr/include/generated/syscalls/kernel.h:1043

7 0x08014a3c in i2c_stm32_transfer (dev=0x8023b4c <__device_dts_ord_66>, msg=0x200015d0 <z_main_stack+264>, num_msgs=0xff, slave=0xfff5) at west/zephyr/drivers/i2c/i2c_ll_stm32.c:167

8 0x08012df8 in z_impl_i2c_transfer (dev=0x8023b4c <__device_dts_ord_66>, msgs=0x200015d0 <z_main_stack+264>, num_msgs=0x2, addr=0x3c) at west/zephyr/include/drivers/i2c.h:589

9 0x08012ea6 in i2c_transfer (dev=0x8023b4c <__device_dts_ord_66>, msgs=0x200015d0 <z_main_stack+264>, num_msgs=0x2, addr=0x3c) at /dev/shm/d/build/alonzo_lvgl/zephyr/include/generated/syscalls/i2c.h:90

10 0x08012e4c in i2c_burst_write (dev=0x8023b4c <__device_dts_ord_66>, dev_addr=0x3c, start_addr=0x0, buf=0x20001934 <z_main_stack+1132> " ", num_bytes=0x8) at west/zephyr/include/drivers/i2c.h:997

11 0x08012e7a in i2c_burst_write_dt (spec=0x80241ac , start_addr=0x0, buf=0x20001934 <z_main_stack+1132> " ", num_bytes=0x8) at west/zephyr/include/drivers/i2c.h:1019

12 0x08012efc in ssd1306_write_bus (dev=0x8023b64 <__device_dts_ord_67>, buf=0x20001934 <z_main_stack+1132> " ", len=0x8, command=0x1) at west/zephyr/drivers/display/ssd1306.c:79

13 0x080130fe in ssd1306_write (dev=0x8023b64 <__device_dts_ord_67>, x=0x10, y=0x18, desc=0x2000199c <z_main_stack+1236>, buf=0x2000032c ) at west/zephyr/drivers/display/ssd1306.c:241

14 0x08011e40 in display_write (dev=0x8023b64 <__device_dts_ord_67>, x=0x10, y=0x18, desc=0x2000199c <z_main_stack+1236>, buf=0x2000032c ) at west/zephyr/include/drivers/display.h:232

15 0x08011ef2 in lvgl_flush_cb_mono (disp_drv=0x20002984 <kheap.system_heap+892>, area=0x20000318 <disp_buf+16>, color_p=0x2000032c ) at west/zephyr/lib/gui/lvgl/lvgl_display_mono.c:26

16 0x0800871e in lv_refr_vdb_flush () at west/modules/lib/gui/lvgl/src/lv_core/lv_refr.c:751

17 0x080085cc in lv_refr_area_part (area_p=0x200029de <kheap.system_heap+982>) at west/modules/lib/gui/lvgl/src/lv_core/lv_refr.c:559

18 0x0800835c in lv_refr_area (area_p=0x200029de <kheap.system_heap+982>) at west/modules/lib/gui/lvgl/src/lv_core/lv_refr.c:460

19 0x0800812a in lv_refr_areas () at west/modules/lib/gui/lvgl/src/lv_core/lv_refr.c:382

20 0x08007d32 in _lv_disp_refr_task (task=0x20002864 <kheap.system_heap+604>) at west/modules/lib/gui/lvgl/src/lv_core/lv_refr.c:199

21 0x0800c07a in lv_task_exec (task=0x20002864 <kheap.system_heap+604>) at west/modules/lib/gui/lvgl/src/lv_misc/lv_task.c:409

22 0x0800bcf8 in lv_task_handler () at west/modules/lib/gui/lvgl/src/lv_misc/lv_task.c:142

23 0x080018f2 in main () at /dev/shm/d/proj/alonzo_lvgl/src/main.c:154

After MCU halt

0 z_arm_usage_fault () at west/zephyr/arch/arm/core/aarch32/cortex_m/fault_s.S:80

1

Prologue scan stopped at 0x8003770

2 z_arm_pendsv () at west/zephyr/arch/arm/core/aarch32/swap_helper.S:346

3

4 arch_irq_unlock (key=0x10) at west/zephyr/include/arch/arm/aarch32/asm_inline_gcc.h:109

5 arch_swap (key=0x8000000) at west/zephyr/arch/arm/core/aarch32/swap.c:44

6 0x08022fe6 in z_swap_irqlock (key=0x8003db1) at west/zephyr/kernel/include/kswap.h:184

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Environment (please complete the following information):

Additional context If step to the code, it will not cause error, if not step through, the MCU will stuck.

RafaelLeeImg commented 2 years ago

The problem is located in In error conditions The error is triggered under this line, before this line, the value of r0 is already wrong.

// zephyr/arch/arm/core/aarch32/swap_helper.S
SECTION_FUNC(TEXT, z_arm_pendsv)
...
stmia r0, {v1-v8, ip}
RafaelLeeImg commented 2 years ago

Under error condition, the r2 points to an non_exist position, with address 0x1000xxxx which is not valid. The r2 meant to be _kernel.cpu.current. west/zephyr/arch/arm/core/aarch32/swap_helper.S

    ldr r2, [r1, #_kernel_offset_to_current]
RafaelLeeImg commented 2 years ago

In fault condition, r0 holds an address of non-writable address or some address for peripherials like 0xE000ED00, when trying to write the address of $r0, usage fault is triggered.

...
SECTION_FUNC(TEXT, z_arm_pendsv)
...
    stmia r0, {v1-v8, ip}
...
    ldmia r0, {v1-v8, ip}
erwango commented 2 years ago

@dkalowsk would you mind assigning to ARM maintainer as this is not STM32 specific according to analysis ?

dkalowsk commented 2 years ago

@erwango done. You were suggested at the bug scrub. The STM label was applied due to the reported platform.

RafaelLeeImg commented 2 years ago

This is not a bug, This problem is caused by insufficient stack size. Set stack to 4k will solve this problem.

CONFIG_MAIN_STACK_SIZE=4069

I'll update the details soon.

erwango commented 2 years ago

@RafaelLeeImg Thank for the heads up. Don't hesitate to close when ready

nashif commented 2 years ago

closing a non-bug