samples/subsys/nvs fails on some stm32 targets

FRASTM commented 7 months ago

Describe the bug Executing the samples/subsys/nvs testcase fails when running on the stm32u585 disco board

Please also mention any information which could help others to understand the problem you're facing:

What target platform are you using?
What have you tried to diagnose or workaround this issue?
Is this a regression? If yes, have you been able to "git bisect" it to a specific commit?
...

To Reproduce Steps to reproduce the behavior:

west build -p always -b b_u585i_iot02a samples/subsys/nvs/
west flash
See error

Expected behavior rebooting ... until

Reboot counter reached max value.                                               
Reset to 0 and exit test.

Impact What impact does this issue have on your progress (e.g., annoyance, showstopper)

Logs and console output

[00:00:00.140,000] <inf> flash_stm32_ospi: OSPI flash config is OPI / DTR
[00:00:00.147,000] <inf> flash_stm32_ospi: Read SFDP from octoFlash
[00:00:00.153,000] <inf> flash_stm32_ospi: Read SFDP from octoFlash
[00:00:00.160,000] <inf> flash_stm32_ospi: NOR octo-flash at 0x70000000 (0x4000)
*** Booting Zephyr OS build v3.6.0-1433-g105ef05c52d6 ***
[00:00:00.174,000] <dbg> fs_nvs: nvs_recover_last_ate: Recovering last ate from0
[00:00:00.187,000] <inf> fs_nvs: 3 Sectors of 8192 bytes
[00:00:00.192,000] <inf> fs_nvs: alloc wra: 0, 1f80
[00:00:00.197,000] <inf> fs_nvs: data wra: 0, c0
Id: 1, Address: 192.168.1.1
Id: 2, Key: ff fe fd fc fb fa f9 f8 
Id: 3, Reboot_counter: 0
Id: 4, Data: DATA
Id: 5, Longarray: 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 
[00:00:00.246,000] <err> os: ***** USAGE FAULT *****                            
[00:00:00.251,000] <err> os:   Illegal use of the EPSR                          
[00:00:00.257,000] <err> os: r0/a1:  0x08009317  r1/a2:  0x2000042c  r2/a3:  0x0
[00:00:00.266,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0xa
[00:00:00.275,000] <err> os:  xpsr:  0x00000000                                 
[00:00:00.280,000] <err> os: Faulting instruction address (r15/pc): 0x20000e08  
[00:00:00.288,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0  
[00:00:00.296,000] <err> os: Current thread: 0x20000178 (unknown)               
[00:00:00.302,000] <err> os: Halting system

Environment (please complete the following information):

OS: ubuntu
Toolchain zephyr-sdk-0.16.3
Commit SHA 105ef05c52d62fbb192eecb78475567d8475db0f

Additional context Regression occured since commit 3ca50f506012c17ae78ac76e43837e955fdca4eb from PR #69395

FRASTM commented 7 months ago

For the stm32u585 disco board (b_u585i_iot02a) disabling the externl NOR flash seems to fix the pb

&mx25lm51245 {
    status = "disabled";
};

FRASTM commented 7 months ago

also failing with stm32f091 on nucleo_f091rc target

in the k_msleep(SLEEP_TIME); instruction (z_tick_sleep (ticks=) at~/zephyrproject/zephyr/kernel/sched.c)

*** Booting Zephyr OS build v3.6.0-1433-g105ef05c52d6 ***                       
[00:00:00.005,000] <dbg> fs_nvs: nvs_recover_last_ate: Recovering last ate from0
[00:00:00.027,000] <inf> fs_nvs: 3 Sectors of 2048 bytes                        
[00:00:00.032,000] <inf> fs_nvs: alloc wra: 0, 7c0                              
[00:00:00.037,000] <inf> fs_nvs: data wra: 0, 9e                                
Id: 1, Address: 192.168.1.1                                                     
Id: 2, Key: ff fe fd fc fb fa f9 f8                                             
Id: 3, Reboot_counter: 0                                                        
Id: 4, Data: DATA                                                               
Id: 5, Longarray: 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 
[00:00:00.086,000] <err> os: ***** HARD FAULT *****                             
[00:00:00.092,000] <err> os: r0/a1:  0x0800660f  r1/a2:  0x200003e0  r2/a3:  0x0
[00:00:00.101,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x3e3e006e r14/lr:  0xa
[00:00:00.109,000] <err> os:  xpsr:  0x00000000                                 
[00:00:00.115,000] <err> os: Faulting instruction address (r15/pc): 0x20000da0  
[00:00:00.123,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0   
[00:00:00.130,000] <err> os: Current thread: 0x20000168 (unknown)               
[00:00:00.137,000] <err> os: Halting system

FRASTM commented 7 months ago

changing the k_sleep with

diff --git a/samples/subsys/nvs/src/main.c b/samples/subsys/nvs/src/main.c
index 5ea1976d319..b859b619623 100644
--- a/samples/subsys/nvs/src/main.c
+++ b/samples/subsys/nvs/src/main.c
@@ -194,7 +194,7 @@ int main(void)

    cnt = 5;
    while (1) {
-       k_msleep(SLEEP_TIME);
+       k_busy_wait(SLEEP_TIME * 1000);
        if (reboot_counter < MAX_REBOOT) {
            if (cnt == 5) {
                /* print some history information about

makes execution continue until final counter:

Id: 3, Reboot_counter: 400                                                      
Id: 5, Longarray: 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 
Reboot counter reached max value.                                               
Reset to 0 and exit test.                                                       
[00:00:00.353,000] <err> os: ***** USAGE FAULT *****                            
[00:00:00.358,000] <err> os:   Illegal use of the EPSR                          
[00:00:00.364,000] <err> os: r0/a1:  0x0800931f  r1/a2:  0x2000042c  r2/a3:  0x0
[00:00:00.373,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0xa
[00:00:00.381,000] <err> os:  xpsr:  0x00000000                                 
[00:00:00.387,000] <err> os: Faulting instruction address (r15/pc): 0x20000e08  
[00:00:00.394,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0  
[00:00:00.402,000] <err> os: Current thread: 0x20000178 (unknown)               
[00:00:00.409,000] <err> os: Halting system

Disable the the external flash memory of the b_u585i_iot02 target to end the test properly

Laczen commented 6 months ago

@FRASTM, a similar issue was reported in #70109. The problem was reported as resolved when removing the k_msleep().

erwango commented 6 months ago

@FRASTM, a similar issue was reported in #70109. The problem was reported as resolved when removing the k_msleep().

Ok, but it used to work...

FRASTM commented 6 months ago

@FRASTM, a similar issue was reported in #70109. The problem was reported as resolved when removing the k_msleep().

I see, but the fact is that samples/subsys/nvs/src/main.c (still) has a k_msleep(SLEEP_TIME);

Laczen commented 6 months ago

@erwango, @FRASTM, it is possible to remove the k_msleep() call from the sample. But this would just be hiding the problem there seems to be with this call. Could it be that k_msleep() now requires powermanagement to be enabled ?

erwango commented 6 months ago

Could it be that k_msleep() now requires powermanagement to be enabled ?

It couldn't or at least it shouldn't. Kernel is supposed to work w/o PM

Laczen commented 6 months ago

@erwango, @FRASTM I can replicate the problem on a b_l4s5i_iot01a, but I have no clue why k_msleep() fails. k_msleep() works for the blinky sample.

Is the error returned when external flash is enabled a indication of what is going wrong ? This error shouldn't appear as the program just ends with rc=0.

Laczen commented 6 months ago

@erwango, @FRASTM, The problem is caused by the STM32 Octo SPI flash driver. If this is disabled the problem disappears.

FRASTM commented 6 months ago

@erwango, @FRASTM, The problem is caused by the STM32 Octo SPI flash driver. If this is disabled the problem disappears.

I agree, that's true for the stm32u585 disco, but not for the nucleo_g071rb :

*** Booting Zephyr OS build v3.6.0-1434-gac08bfbc8d8a ***
[00:00:00.005,000] <dbg> fs_nvs: nvs_recover_last_ate: Recovering last ate from0
[00:00:00.022,000] <dbg> fs_nvs: nvs_add_gc_done_ate: Adding gc done ate at 7f0
[00:00:00.030,000] <inf> fs_nvs: 3 Sectors of 2048 bytes
[00:00:00.035,000] <inf> fs_nvs: alloc wra: 0, 7e8
[00:00:00.041,000] <inf> fs_nvs: data wra: 0, 0
No address found, adding 192.168.1.1 at id 1
No key found, adding it at id 2                         
No Reboot counter found, adding it at id 3              
Id: 4 not found, adding it                              
Longarray not found, adding it as id 5                  
[00:00:00.066,000] <err> os: ***** HARD FAULT *****     
[00:00:00.072,000] <err> os: r0/a1:  0x080069e3  r1/a2:  0x20000350  r2/a3:  0x0
[00:00:00.081,000] <err> os: r3/a4:  0x080003bd r12/ip:  0x00000040 r14/lr:  0x0
[00:00:00.089,000] <err> os:  xpsr:  0x00000000                                 
[00:00:00.094,000] <err> os: Faulting instruction address (r15/pc): 0x00000014  
[00:00:00.102,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0   
[00:00:00.110,000] <err> os: Current thread: 0x200000d0 (unknown)               
[00:00:00.117,000] <err> os: Halting system

FRASTM commented 6 months ago

When the ksleep is replaced by the busy_wait, the test can execute to the end but still crashes at the very last reboot with the nucleo_g071rb (even if #define MAX_REBOOT 2)

Doing last reboot...�*** Booting Zephyr OS build v3.6.0-1434-gac08bfbc8d8a ***                       
[00:00:00.005,000] <dbg> fs_nvs: nvs_recover_last_ate: Recovering last ate from sector 2             
[00:00:00.019,000] <inf> fs_nvs: 3 Sectors of 2048 bytes                                             
[00:00:00.024,000] <inf> fs_nvs: alloc wra: 2, 508                                                   
[00:00:00.030,000] <inf> fs_nvs: data wra: 2, 2e0                                                    
Id: 1, Address: 192.168.1.1                                                                          
Id: 2, Key: ff fe fd fc fb fa f9 f8                                                                  
Id: 3, Reboot_counter: 400                                                                           
Id: 5, Longarray: 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 
Reboot counter reached max value.                                                                    
Reset to 0 and exit test.                                                                            
[00:00:00.197,000] <err> os: ***** HARD FAULT *****                                                  
[00:00:00.203,000] <err> os: r0/a1:  0x08006a21  r1/a2:  0x20000350  r2/a3:  0x00000000              
[00:00:00.212,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x20000cf0 r14/lr:  0x0000000a              
[00:00:00.220,000] <err> os:  xpsr:  0x00000000                                                      
[00:00:00.225,000] <err> os: Faulting instruction address (r15/pc): 0x20000d10                       
[00:00:00.233,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0                        
[00:00:00.241,000] <err> os: Current thread: 0x200000d0 (unknown)                                    
[00:00:00.248,000] <err> os: Halting system

Is it a special case with M0+ cortex ? Crash occurs at k_thread_abort(k_current_get()); z_thread_entry (entry=0x8003b4d <bg_thread_main>, p1=0x0, p2=0x0, p3=0x0) at /local/home/frq09495/zephyrproject/zephyr/lib/os/thread_entry.c:50

It appears that increasing the CONFIG_MAIN_STACK_SIZE is a valid WA

zephyrproject-rtos / zephyr

samples/subsys/nvs fails on some stm32 targets #70608