zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
11k stars 6.69k forks source link

STM32H7xx: Driver for internal flash memory partially uses a fixed flash program word size, which doesn't fit for all STM32H7xx SOCs (e.g. STM32H7A3, STM32H7B0, STM32H7B3) leading to potential flash data corruption #45568

Closed chrihell closed 2 years ago

chrihell commented 2 years ago

Describe the bug When running an adapted version of the Zephyr littlefs example on a Nucelo-H7A3ZI-Q board it fails to create and format the littlefs filesystem placed in an internal flash partition.

The UART log output shows that littlefs cannot successfully create the filesystem and fails to create superblock 0 with error code -28 (see log output further down) Unfortunately the meaning of the error code is a bit misleading here.

Please also mention any information which could help others to understand the problem you're facing:

To Reproduce Steps to reproduce the behavior:

  1. Copy attached .conf and .overlay files to directory zephyr\samples\subsys\fs\littlefs\boards to add support for the nucleo_h7a3zi_q board to the littlefs example project. Remove the .txt file extension that was added because of github. (see section 'Additional context' for attached files)
  2. Set CONFIG_APP_WIPE_STORAGE=y in prj.conf of the example project. This ensures, that the flash partition is always getting erased and that the littlefs filesystem is recreated.
  3. Build example project: west build -p -b nucleo_h7a3zi_q zephyr\samples\subsys\fs\littlefs
  4. Connect board via USB cable and open serial console
  5. Flash the example project west flash
  6. See UART log ouput, as attached to this issue, showing the error

Expected behavior The littlefs filesystem initialization shouldn't fail and the example program should be able to execute in the intended way and do it's file operations (e.g. incrementing and writing boot counter to file). No errors should be reported in the log output by littlefs, the file system or flash driver. File Systems API calls should work as expected.

Impact I'm rather new to Zephyr and it took me several hours to track down and to locally fix the issue. Initially I expected that the example project builds and would work as-is out of the box for the used Nucleo board, as support was added for it and littlefs. So it looked like a showstopper in the beginning, but was in the end luckily more of a learning experience in debugging and working with Zephyr and about filing this bug report.

The issue does probably not only affect littlefs write operations, but potentially affects all internal flash writes on STM32H7xx SOCs that have a flash program word size that is different from 256 bits.

Logs and console output In the described error situation the UART log output is the following:

I: littlefs partition at /lfs1
*** Booting Zephyr OS build zephyr-v3.0.0-3700-g47850a301c41  ***
Sample program to r/w files on littlefs
Area 0 at 0x80000 on FLASH_CTRL for 524288 bytes
I: LittleFS version 2.4, disk version 2.0
I: FS at FLASH_CTRL:0x80000 is 64 0x2000-byte blocks with 512 cycle
I: sizes: rd 1 ; pr 16 ; ca 256 ; la 32
E: WEST_TOPDIR/modules/fs/littlefs/lfs.c:1077: Corrupted dir pair at {0x0, 0x1}
W: can't mount (LFS -84); formatting
W: WEST_TOPDIR/modules/fs/littlefs/lfs.c:1802: Superblock 0x0 has become unwritable
E: format failed (LFS -28)
E: fs mount error (-28)
FAIL: mount id 0 at /lfs1: -28

The log shows how littlefs tries to format the partition, but fails to write Superblock 0 correctly and then fails with error code -28.

Environment (please complete the following information):

Additional context nucleo_h7a3zi_q.conf.txt nucleo_h7a3zi_q.overlay.txt

Proposed fix Change the code in zephyr\drivers\flash\flash_stm32h7x.c:376 to:

for (i = 0; i < len && i + nbytes <= len; i += nbytes, offset += nbytes) {

This essentially replaces the constant 32 by the already calculated number of bytes in nbytes. The calculated number fits to the flash program word size of the used STM32H7xx SOC type, as it is based on a SOC type specific definition of the flash program word size.

Explanation Original Code

...
int flash_stm32_write_range(const struct device *dev, unsigned int offset,
                const void *data, unsigned int len)
{
    int rc = 0;
    int i, j;
    const uint8_t ndwords = FLASH_NB_32BITWORD_IN_FLASHWORD / 2;
    const uint8_t nbytes = FLASH_NB_32BITWORD_IN_FLASHWORD * 4;
    uint8_t unaligned_datas[nbytes];

    for (i = 0; i < len && i + nbytes <= len; i += nbytes, offset += nbytes) {
        rc = write_ndwords(dev, offset,
                   (const uint64_t *) data + (i >> 3),
                   ndwords);
        if (rc < 0) {
            return rc;
        }
    }
...

Log output via UART after applying the proposed solution

Littlefs detects a corrupted filesystem, as the flash partition got erased by the application. But this time it succeeds with formatting and the freshly created filesystem can be mounted afterwards. The file operations of the example application work without errors, too.

I: littlefs partition at /lfs1
*** Booting Zephyr OS build zephyr-v3.0.0-3700-g47850a301c41  ***
Sample program to r/w files on littlefs
Area 0 at 0x80000 on FLASH_CTRL for 524288 bytes
E: Erasing flash area ... 0
I: LittleFS version 2.4, disk version 2.0
I: FS at FLASH_CTRL:0x80000 is 64 0x2000-byte blocks with 512 cycle
I: sizes: rd 1 ; pr 16 ; ca 256 ; la 32
E: WEST_TOPDIR/modules/fs/littlefs/lfs.c:1077: Corrupted dir pair at {0x0, 0x1}
W: can't mount (LFS -84); formatting
I: /lfs1 mounted
/lfs1 mount: 0
/lfs1: bsize = 16 ; frsize = 8192 ; blocks = 64 ; bfree = 62

Listing dir /lfs1 ...
/lfs1/boot_count read count:0 (bytes: 0)
/lfs1/boot_count write new boot count 1: [wr:1]
I: Test file: /lfs1/pattern.bin not found, create one!
------ FILE: /lfs1/pattern.bin ------
01 55 55 55 55 55 55 55 02 55 55 55 55 55 55 55
03 55 55 55 55 55 55 55 04 55 55 55 55 55 55 55
05 55 55 55 55 55 55 55 06 55 55 55 55 55 55 55
07 55 55 55 55 55 55 55 08 55 55 55 55 55 55 55
09 55 55 55 55 55 55 55 0a 55 55 55 55 55 55 55
0b 55 55 55 55 55 55 55 0c 55 55 55 55 55 55 55
0d 55 55 55 55 55 55 55 0e 55 55 55 55 55 55 55
0f 55 55 55 55 55 55 55 10 55 55 55 55 55 55 55
11 55 55 55 55 55 55 55 12 55 55 55 55 55 55 55
13 55 55 55 55 55 55 55 14 55 55 55 55 55 55 55
15 55 55 55 55 55 55 55 16 55 55 55 55 55 55 55
17 55 55 55 55 55 55 55 18 55 55 55 55 55 55 55
19 55 55 55 55 55 55 55 1a 55 55 55 55 55 55 55
1b 55 55 55 55 55 55 55 1c 55 55 55 55 55 55 55
1d 55 55 55 55 55 55 55 1e 55 55 55 55 55 55 55
1f 55 55 55 55 55 55 55 20 55 55 55 55 55 55 55
21 55 55 55 55 55 55 55 22 55 55 55 55 55 55 55
23 55 55 55 55 55 55 55 24 55 55 55 55 55 55 55
25 55 55 55 55 55 55 55 26 55 55 55 55 55 55 55
27 55 55 55 55 55 55 55 28 55 55 55 55 55 55 55
29 55 55 55 55 55 55 55 2a 55 55 55 55 55 55 55
2b 55 55 55 55 55 55 55 2c 55 55 55 55 55 55 55
2d 55 55 55 55 55 55 55 2e 55 55 55 55 55 55 55
2f 55 55 55 55 55 55 55 30 55 55 55 55 55 55 55
31 55 55 55 55 55 55 55 32 55 55 55 55 55 55 55
33 55 55 55 55 55 55 55 34 55 55 55 55 55 55 55
35 55 55 55 55 55 55 55 36 55 55 55 55 55 55 55
37 55 55 55 55 55 55 55 38 55 55 55 55 55 55 55
39 55 55 55 55 55 55 55 3a 55 55 55 55 55 55 55
3b 55 55 55 55 55 55 55 3c 55 55 55 55 55 55 55
3d 55 55 55 55 55 55 55 3e 55 55 55 55 55 55 55
3f 55 55 55 55 55 55 55 40 55 55 55 55 55 55 55

41 55 55 55 55 55 55 55 42 55 55 55 55 55 55 55
43 55 55 55 55 55 55 55 44 55 55 55 55 55 55 55
45 55 aa
I: /lfs1 unmounted
/lfs1 unmount: 0
ABOSTM commented 2 years ago

@chrihell , Thanks for reporting this issue, and welcome to Zephyr community. Thank you also for your analysis, which is very detailed, and further more exact, And finally thanks for the fix you proposed, and the test you performed, it looks good.

So you made all the job, thus I propose you to continu the Zephyr experience, and push, on your own, the fix that you proposed in a Pull Request.

chrihell commented 2 years ago

@ABOSTM Thanks for the quick and welcoming reply. I'll try to do the PR then. It can't be that hard to change a line of code, right? ;)