zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.68k stars 6.53k forks source link

Problems with low power and entropy on STM32WB55 #72640

Closed gumulka closed 4 months ago

gumulka commented 5 months ago

Describe the bug When using PM_DEVICE together with ENTROPY_GENERATOR on the STM32WB55 board, I get an BUS fault after a while.

The time until the bug occurs can vary, but it does occur after a few seconds every time.

I have tried the v3.6.0 and also mainline zephyr and both have the same error.

To Reproduce

I tried this with the nucleo_wb55rg devboard. To be precise MB1355D-01.

Enable CONFIG_PM, CONFIG_PM_DEVICE and CONFIG_ENTROPY_GENERATOR in the Bluetooth peripheral sample and wait a bit. I have speed up the process, by replacing the content of the while loop in main with k_sleep(K_MSEC(2));

patch for this

compile and flash with:

west build -p -b nucleo_wb55rg samples/bluetooth/peripheral
west flash

Expected behavior No bus fault.

Impact Not sure yet, will have to see. Currently annoyance.

Logs and console output

Log output:

*** Booting Zephyr OS build v3.6.0 ***
[00:00:00.020,000] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.020,000] <inf> fs_nvs: alloc wra: 0, f88
[00:00:00.020,000] <inf> fs_nvs: data wra: 0, 90
[00:00:00.042,000] <inf> bt_hci_core: No ID address. App must call settings_load()
Bluetooth initialized
[00:00:00.044,000] <inf> bt_hci_core: Identity: 02:80:E1:00:00:00 (public)
[00:00:00.044,000] <inf> bt_hci_core: HCI: version 1.0b (0x00) revision 0x8077, manufacturer 0x0030
[00:00:00.044,000] <inf> bt_hci_core: LMP: version 1.0b (0x00) subver 0x2177
Advertising successfully started
Indicate VND attr 0x80189a0 (UUID 12345678-1234-5678-1234-56789abcdef1)
[00:00:00.445,000] <err> os: ***** BUS FAULT *****
[00:00:00.445,000] <err> os:   Imprecise data bus error
[00:00:00.445,000] <err> os: r0/a1:  0x08018070  r1/a2:  0x58001000  r2/a3:  0x2000049c
[00:00:00.445,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000001 r14/lr:  0x08002a21
[00:00:00.445,000] <err> os:  xpsr:  0x61000000
[00:00:00.445,000] <err> os: Faulting instruction address (r15/pc): 0x0800e0f6
[00:00:00.446,000] <err> os: >>> ZEPHYR FATAL ERROR 26: Unknown error on CPU 0
[00:00:00.446,000] <err> os: Current thread: 0x200015e8 (unknown)
[00:00:00.448,000] <err> os: Halting system

Time until bus fault occurs can vary. With 2ms sleep I have seen between 0.036 and 2.500 seconds in the logs.

Backtrace from gdb with zephyr v3.6.0:

#0  arch_system_halt (reason=reason@entry=0x1a) at /PATH_TO_ZEPHYR/zephyr/kernel/fatal.c:30
#1  0x0800f82c in k_sys_fatal_error_handler (reason=reason@entry=0x1a, esf=esf@entry=0x20003c80 <z_interrupt_stacks+2048>) at /PATH_TO_ZEPHYR/zephyr/kernel/fatal.c:44
#2  0x0800f8de in z_fatal_error (reason=reason@entry=0x1a, esf=esf@entry=0x20003c80 <z_interrupt_stacks+2048>) at /PATH_TO_ZEPHYR/zephyr/kernel/fatal.c:118
#3  0x08003a88 in z_arm_fatal_error (reason=0x1a, esf=0x20003c80 <z_interrupt_stacks+2048>, esf@entry=0x20003c90 <z_interrupt_stacks+2064>) at /PATH_TO_ZEPHYR/zephyr/arch/arm/core/fatal.c:86
#4  0x08003eb8 in z_arm_fault (msp=<optimized out>, psp=<optimized out>, exc_return=<optimized out>, callee_regs=<optimized out>) at /PATH_TO_ZEPHYR/zephyr/arch/arm/core/cortex_m/fault.c:1157
#5  0x08003f88 in z_arm_usage_fault () at /PATH_TO_ZEPHYR/zephyr/arch/arm/core/cortex_m/fault_s.S:102
#6  <signal handler called>
#7  clock_control_off (sys=0x2000050c <pclken_rng>, dev=0x8018238 <__device_dts_ord_5>) at /PATH_TO_ZEPHYR/zephyr/include/zephyr/drivers/clock_control.h:150
#8  entropy_stm32_suspend () at /PATH_TO_ZEPHYR/zephyr/drivers/entropy/entropy_stm32.c:136
#9  0x080029f4 in pm_device_action_run (dev=dev@entry=0x8018310 <__device_dts_ord_58>, action=action@entry=PM_DEVICE_ACTION_SUSPEND) at /PATH_TO_ZEPHYR/zephyr/subsys/pm/device.c:60
#10 0x0800272c in pm_suspend_devices () at /PATH_TO_ZEPHYR/zephyr/subsys/pm/pm.c:74
#11 pm_system_suspend (ticks=0xa) at /PATH_TO_ZEPHYR/zephyr/subsys/pm/pm.c:201
#12 0x0800fc58 in idle (unused1=<optimized out>, unused2=<optimized out>, unused3=<optimized out>) at /PATH_TO_ZEPHYR/zephyr/kernel/idle.c:85

Environment (please complete the following information):

github-actions[bot] commented 5 months ago

Hi @gumulka! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

FRASTM commented 5 months ago

@gumulka I confirm the ZEPHYR FATAL ERROR 26 with the condition you exposed, even with k_sleep(K_MSEC(10)); Could you please test the following patch on your own.

diff --git a/drivers/entropy/entropy_stm32.c b/drivers/entropy/entropy_stm32.c
index a9cbda2abc7..8bc79fdc898 100644
--- a/drivers/entropy/entropy_stm32.c
+++ b/drivers/entropy/entropy_stm32.c
@@ -112,6 +112,10 @@ static int entropy_stm32_suspend(void)
    RNG_TypeDef *rng = dev_data->rng;
    int res;

+#if defined(CONFIG_SOC_SERIES_STM32WBX) || defined(CONFIG_STM32H7_DUAL_CORE)
+   /* Prevent concurrent access with PM */
+   z_stm32_hsem_lock(CFG_HW_RNG_SEMID, HSEM_LOCK_WAIT_FOREVER);
+#endif /* CONFIG_SOC_SERIES_STM32WBX || CONFIG_STM32H7_DUAL_CORE */
    LL_RNG_Disable(rng);

 #ifdef CONFIG_SOC_SERIES_STM32WBAX
@@ -136,6 +140,10 @@ static int entropy_stm32_suspend(void)
    res = clock_control_off(dev_data->clock,
            (clock_control_subsys_t)&dev_cfg->pclken[0]);

+#if defined(CONFIG_SOC_SERIES_STM32WBX) || defined(CONFIG_STM32H7_DUAL_CORE)
+   z_stm32_hsem_unlock(CFG_HW_RNG_SEMID);
+#endif /* CONFIG_SOC_SERIES_STM32WBX || CONFIG_STM32H7_DUAL_CORE */
+
    return res;
 }

It fixes the error on my side.

*** Booting Zephyr OS build v3.6.0-3809-gf4e65af48c04 ***
[00:00:00.018,000] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.018,000] <inf> fs_nvs: alloc wra: 0, f80
[00:00:00.018,000] <inf> fs_nvs: data wra: 0, a0
[00:00:00.042,000] <inf> bt_hci_core: No ID address. App must call settings_load()
Bluetooth initialized
[00:00:00.044,000] <inf> bt_hci_core: Identity: 02:80:E1:00:00:00 (public)
[00:00:00.044,000] <inf> bt_hci_core: HCI: version 1.0b (0x00) revision 0xa072, manufacturer 0x0030
[00:00:00.044,000] <inf> bt_hci_core: LMP: version 1.0b (0x00) subver 0x2172 
Advertising successfully started                                             
Indicate VND attr 0x8018e20 (UUID 12345678-1234-5678-1234-56789abcdef1) 
gumulka commented 5 months ago

@FRASTM It also fixes the error on my side. Happy to see it getting merged into mainline!

FRASTM commented 5 months ago

@FRASTM It also fixes the error on my side. Happy to see it getting merged into mainline!

Great, I am preparing a PR