zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.17k stars 6.23k forks source link

subsys: lorawan: nvm: STM32H7 unaligned memory access (read) exception when writing LoRaWAN setting names to internal flash #59029

Closed tmilkovic51 closed 11 months ago

tmilkovic51 commented 1 year ago

Describe the bug On STM32H7 MCUs, an unaligned memory access exception happens on successful LoRaWAN network join.

When joining a LoRaWAN network by calling lorawan_join() function, the LoRaWAN subsystem writes some settings to NVS. Each NVS setting has a string representing its name (a key). NVS setting name strings are treated as const char arrays, which go in .rodata section. Since there is no alignment requirement for const char arrays, linker puts the setting name strings from lorawan_nvm_settings.c file one after another (sequentially) in .rodata section (there are 7 different LoRaWAN settings in lorawan_nvm_settings.c file).

In lorawan_nvm_save_settings function, LoRaWAN setting name string is passed to settings_save_one(), which eventually gets passed to STM32 flash driver's flash_stm32_write(const struct device *dev, off_t offset, const void *data, size_t len) function as const void *data parameter. This function then calls STM32H7 flash driver's flash_stm32_write_range(const struct device *dev, unsigned int offset, const void *data, unsigned int len) function which casts pointer to data from const void* to const uint64_t* and passes the data to write_nwords(const struct device *dev, off_t offset, const uint64_t *data, uint8_t n) function.

Finally, after all these calls, pointer which points to arbitrary aligned data (const char array located in flash) ends up being casted to const uint64_t* data and gets accessed by pointer dereference in write_ndwords function:

    /* Perform the data write operation at the desired memory address */
    for (i = 0; i < n; ++i) {
        flash[i] = data[i];

        /* Flush the data write */
        __DSB();

        wait_write_queue(&sector);
    }

(here is a link to the line where this code is located)

This dereference causes unaligned memory access exception on STM32H7, which could be fixed either in lorawan_nvm_settings.c file by copying the setting name string to a temporary 8-byte aligned buffer which will then get passed to settings_save_one() function or it can be fixed in flash_stm32h7x.c file (better solution) by handling a special case if source data (data to be written to flash) is coming from flash region (e.g. .rodata section) and it is unaligned.

Please also mention any information which could help others to understand the problem you're facing:

To Reproduce I encountered this on a custom board, but the issue could be reproduced by building and running samples/subsys/lorawan/class_a sample on stm32h7b3i_dk board with storage flash partition and LoRa transceeiver chip added to board device tree, settings subsystem with NVS backend enabled (CONFIG_NVS=y CONFIG_SETTINGS=y CONFIG_SETTINGS_NVS=y) and LoRaWAN NVM enabled (CONFIG_LORAWAN_NVM_SETTINGS=y). This should cause unaligned memory access exception to happen upon successful LoRaWAN network join.

Expected behavior STM32H7 flash driver should support unaligned source data to be written to flash, regardless if source data is located in RAM or flash (to be honest, I didn't check if STM32H7 flash driver supports writing unaligned source data from RAM to flash, only from flash (.rodata) to flash).

Impact Showstopper for using LoRaWAN subsystem with NVM support (CONFIG_LORAWAN_NVM_SETTINGS=y) on STM32H7A3/B3/B0 MCUs. Some LoRaWAN servers don't support joining if you don't send a monotonically increasing counter value on every new join request, so having CONFIG_LORAWAN_NVM_SETTINGS=y is a must-have.

Logs and console output

Log output - exception happens when stepping over this line (sorry for bunch of dropped messages, it is because debug log level for flash driver is enabled):

*** Booting Zephyr OS build zephyr-v3.3.0 ***
I: Starting bootloader
I: Primary image: magic=bad, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: none
I: Bootloader chainload address offset: 0x18000
I: Jumping to the first image slot
[00:00:00.000,000] <dbg> flash_stm32h7: stm32h7_flash_init: Flash initialized. BS: 16
[00:00:00.000,000] <dbg> flash_stm32h7: stm32h7_flash_init: Block 0: bs: 8192 count: 256
[00:00:00.000,000] <dbg> flash_stm32h7: flash_stm32h7_write_protection: Disable write protection
*** Booting Zephyr OS build zephyr-v3.3.0 ***
--- 1009 messages dropped ---
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73648, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 65632, len: 2
[00:00:00.621,000] <dbg> lorawan_nvm: lorawan_nvm_data_restore: Restoring LoRaWAN settings
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73552, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73568, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73584, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73600, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73616, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73632, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73648, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73664, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 65600, len: 18
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73552, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73568, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73584, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 65776, len: 1
[00:00:00.621,000] <dbg> lorawan_nvm: on_setting_loaded: Key: Crypto
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73552, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73568, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73584, len: 8
[00:00:00.621,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 65776, len: 56
[00:00:00.621,000] <dbg> lorawan_nvm: lorawan_nvm_data_restore: Crypto version: 16777472, DevNonce: 3, JoinNonce: 1925374
[00:00:00.621,000] <dbg> lorawan_nvm: lorawan_nvm_data_restore: LoRaWAN context restored
[00:00:02.700,000] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_write_protection: Enable write protection
--- 45 messages dropped ---
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_write: Write offset: 73536, len: 16
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_write_protection: Disable write protection
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_write_protection: Enable write protection
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73520, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73536, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73552, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73568, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73584, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73600, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73616, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73632, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73648, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73664, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73680, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 73696, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_read: Read offset: 98288, len: 8
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_write: Write offset: 65968, len: 16
[00:00:11.022,000] <dbg> flash_stm32h7: flash_stm32h7_write_protection: Disable write protection
[00:00:11.022,000] <err> os: ***** USAGE FAULT *****
[00:00:11.022,000] <err> os:   Unaligned memory access
[00:00:11.022,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0xffffffff
[00:00:11.022,000] <err> os: r3/a4:  0x00000002 r12/ip:  0x00002b0e r14/lr:  0x0802e92b
[00:00:11.022,000] <err> os:  xpsr:  0x61070000
[00:00:11.022,000] <err> os: Faulting instruction address (r15/pc): 0x0802ebd8
[00:00:11.022,000] <err> os: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0
[00:00:11.022,000] <err> os: Current thread: 0x24002678 (sysworkq)
[00:00:11.094,000] <err> os: Halting system

Environment (please complete the following information):

Additional context prj.conf file used with a custom board and project:

# Logging
CONFIG_LOG=y
CONFIG_NVS_LOG_LEVEL_WRN=y
CONFIG_LORA_LOG_LEVEL_WRN=y
CONFIG_LORAWAN_LOG_LEVEL_DBG=y
CONFIG_FLASH_LOG_LEVEL_DBG=y

# Drivers
CONFIG_GPIO=y
CONFIG_SERIAL=y
CONFIG_SPI=y
CONFIG_FLASH=y
CONFIG_FLASH_PAGE_LAYOUT=y
CONFIG_FLASH_MAP=y
CONFIG_MPU_ALLOW_FLASH_WRITE=y
CONFIG_LORA=y
CONFIG_LORA_SX12XX=y
CONFIG_REBOOT=y

# Standard C library configuration
CONFIG_NEWLIB_LIBC=y
CONFIG_CBPRINTF_FP_SUPPORT=y

# Kernel configuration
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_ISR_STACK_SIZE=4096

# LoRaWAN subsystem
CONFIG_LORAWAN=y
CONFIG_LORAMAC_REGION_EU868=y
CONFIG_LORAWAN_NVM_SETTINGS=y
#CONFIG_LORAWAN_NVM_NONE=y

# Settings subsystem
CONFIG_NVS=y
CONFIG_SETTINGS=y
CONFIG_SETTINGS_NVS=y
CONFIG_SETTINGS_NVS_SECTOR_COUNT=4
CONFIG_SETTINGS_NVS_SECTOR_SIZE_MULT=1

# Debugging - install VS Code Embedded Tools extension to see Zephyr thread info while debugging
CONFIG_DEBUG_THREAD_INFO=y
CONFIG_INIT_STACKS=y
CONFIG_THREAD_STACK_INFO=y

Relevant parts of custom board DTS:

/dts-v1/;
#include <st/h7/stm32h7a3Xi.dtsi>
#include <st/h7/stm32h7a3v(g-i)tx-pinctrl.dtsi>
#include <zephyr/dt-bindings/gpio/gpio.h>
#include <zephyr/dt-bindings/clock/stm32h7_clock.h>

/*********************************************************
 *                      ROOT NODE                        *
 *********************************************************/
/ {
    model = "CUSTOM board";
    compatible = "custom, board";

    chosen {
        zephyr,console = &debug_uart;
        zephyr,shell-uart = &debug_uart;
        zephyr,sram = &sram0;
        zephyr,flash = &flash0;
        zephyr,dtcm = &dtcm;
        zephyr,itcm = &itcm;
        zephyr,code-partition = &slot0_partition;
    };

    aliases {
        lora0 = &lora0;
    };
};

/*********************************************************
 *                        CLOCKS                         *
 *********************************************************/
/* HSE clock used for PLL */
&clk_hse {
    status = "okay";
    clock-frequency = <DT_FREQ_M(25)>;
};

/* PLL1P used for system clock (280 MHz) */
&pll {
    status = "okay";
    clocks = <&clk_hse>;
    div-m = <5>;
    mul-n = <112>;
    div-p = <2>;
};

&rcc {
    status = "okay";
    clocks = <&pll>;
    clock-frequency = <DT_FREQ_M(280)>;
    d1cpre = <1>;
    hpre = <1>;
    d1ppre = <2>;
    d2ppre1 = <2>;
    d2ppre2 = <2>;
    d3ppre = <2>;
};

/*********************************************************
 *                  USART CONTROLLERS                    *
 *********************************************************/
debug_uart: &uart7 {
    status = "okay";
    current-speed = <115200>;
    pinctrl-0 = <&uart7_rx_pb3 &uart7_tx_pb4>;
    pinctrl-names = "default";
};

/*********************************************************
 *                    SPI CONTROLLERS                    *
 *********************************************************/
&spi4 {
    status = "okay";
    pinctrl-0 = <&spi4_sck_pe12 &spi4_miso_pe13 &spi4_mosi_pe14>;
    pinctrl-names = "default";
    cs-gpios = <&gpiod 11 GPIO_ACTIVE_LOW>;

    lora0: lora@0 {
        status = "okay";
        compatible = "semtech,sx1272";
        reg = <0>;
        spi-max-frequency = <3000000>;
        reset-gpios = <&gpioe 15 GPIO_ACTIVE_HIGH>;
        dio-gpios = <&gpiob 12 GPIO_ACTIVE_HIGH>,
                    <&gpiod 8 GPIO_ACTIVE_HIGH>,
                    <&gpiod 9 GPIO_ACTIVE_HIGH>,
                    <&gpioe 10 GPIO_ACTIVE_HIGH>,
                    <&gpioe 8 GPIO_ACTIVE_HIGH>;
        power-amplifier-output = "pa-boost";
    };
};

/*********************************************************
 *                  FLASH PARTITIONS                     *
 *********************************************************/
&flash0 {
    partitions {
        compatible = "fixed-partitions";
        #address-cells = <1>;
        #size-cells = <1>;

        /* 64KB for bootloader */
        boot_partition: partition@0 {
            label = "mcuboot";
            reg = <0x00000000 0x00010000>;
            read-only;
        };

        /* storage: 32KB for settings */
        storage_partition: partition@10000 {
            label = "storage";
            reg = <0x00010000 0x00008000>;
        };

        /* application image slot: 912KB */
        slot0_partition: partition@18000 {
            label = "image-0";
            reg = <0x00018000 0x000E4000>;
        };

        /* backup slot: 912KB */
        slot1_partition: partition@FC000 {
            label = "image-1";
            reg = <0x000FC000 0x000E4000>;
        };

        /* swap slot: 128KB */
        scratch_partition: partition@1E0000 {
            label = "image-scratch";
            reg = <0x001E0000 0x00020000>;
        };
    };
};
Laczen commented 1 year ago

@tmilkovic51, probably something else is wrong, nvs should never do unaligned writes (when the write-block-size has been set correctly).

tmilkovic51 commented 1 year ago

@tmilkovic51, probably something else is wrong, nvs should never do unaligned writes (when the write-block-size has been set correctly).

Destination addresses for flash writes generated by NVS are fine. The issue here is, it seems, with misaligned reads of source data buffer (data which is to be written to flash). In this case, source data buffer is a misaligned Settings key value (string) located somewhere in .rodata section, which is mapped to MCU flash memory. So, the setting key (unaligned string, in my case it was "lorawan/nvm/MacGroup1") is being read from flash (.rodata section) by dereferencing an uint64_t* pointer, which generates an unaligned memory read from flash (I'm not an STM32/ARM expert, but maybe unaligned reads from RAM are fine, but it's unaligned reads from flash that generate the exception).

EDIT: Maybe the title of the issue was a bit misleading. I added to the title that read may be an issue

Laczen commented 1 year ago

@tmilkovic51, the problem is IMO in the flash driver: https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/flash/flash_stm32h7x.c#L374, this supposes that void *data is 8 byte aligned, which is not guaranteed.

@erwango fyi.

erwango commented 1 year ago

@tmilkovic51 I've followed @Laczen opinion and tried to deal with this case, please have a test: https://github.com/zephyrproject-rtos/zephyr/pull/59098

erwango commented 1 year ago

@tmilkovic51 I've followed @Laczen opinion and tried to deal with this case, please have a test: https://github.com/zephyrproject-rtos/zephyr/pull/59098

Not fully ready yet. See my comments in PR

martincalsyn commented 1 year ago

I think this is a broader problem than just the stm32h7 driver. I am encountering the same on stm32u585xx and it's probably not just limited to those. Do we need to change each driver or is there a place we can make a broader change?

tmilkovic51 commented 1 year ago

@martincalsyn I've took a quick look at STM32 series flash drivers and this is their status (I didn't test it on actual hardware, just inspected the code):

Series in bold could potentially have this issue. Affected files are: flash_stm32h7x.c, flash_stm32l5x.c and flash_stm32g4x.c. After unaligned read access is resolved in all of these, no series will be left affected.

martincalsyn commented 1 year ago

Wow. All over the map. The UNALIGNED_GET seems like the least memory-expensive way to do it (though most cpu expensive). I have devkits here for all but the G4 series to test on. Do you plan to offer a revised PR or should I jump in in some fashion?

tmilkovic51 commented 1 year ago

There are some solutions proposed in #59098, so take a look there. It's still TBD on what is the best way to resolve this, but good thing is that the same approach could be applied to all affected series if required.

github-actions[bot] commented 11 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.