zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.68k stars 6.53k forks source link

STM32WL LoRa SoC stuck at initialization due to SPI transmit buffer not emptying #47323

Closed HelloWorldTeraByte closed 2 years ago

HelloWorldTeraByte commented 2 years ago

Describe the bug The MCU gets stuck at initialization when the LoRa radio in the STM32WL SoC is enabled. It seems to be stuck trying to send data through the SPI to the intergrate LoRa module of the STM32WL SoC. The MCU is in an endless loop waiting for the SPI transmit buffer to empty (waiting for LL_SPI_IsActiveFlag_TXE to be set).

I am using Seeed Studio LoRa-E5 mini board with STM32WLE5JC SoC. I have attached a backtrace when it is in a stuck state from a openocd gdb server below.

To Reproduce Steps to reproduce the behavior:

  1. Use the sample at samples/drivers/lora/send
  2. Build and flash for board lora_e5_dev_board: west build -b lora_e5_dev_board && west flash --runner openocd

Expected behavior The MCU should not be stuck at initialization and data should be sent through the SPI.

Impact Showstopper as the MCU does not even boot when the LoRa module is enabled.

Logs and console output The backtrace from a gdb server is shown as below

(gdb) bt
#0  0x08002c02 in LL_SPI_IsActiveFlag_TXE (SPIx=0x58010000)
    at /home/hwtb/zephyrproject/modules/hal/stm32/stm32cube/stm32wlxx/drivers/include/stm32wlxx_ll_spi.h:919
#1  ll_func_tx_is_empty (spi=0x58010000)
    at /home/hwtb/zephyrproject/zephyr/drivers/spi/spi_ll_stm32.h:104
#2  spi_stm32_shift_m (data=<optimized out>, spi=<optimized out>)
    at /home/hwtb/zephyrproject/zephyr/drivers/spi/spi_ll_stm32.c:258
#3  spi_stm32_shift_frames (data=0x2000008c <spi_stm32_dev_data_1>, spi=<optimized out>)
    at /home/hwtb/zephyrproject/zephyr/drivers/spi/spi_ll_stm32.c:359
#4  transceive (asynchronous=false, signal=0x0, rx_bufs=<optimized out>, tx_bufs=<optimized out>,
    config=<optimized out>, dev=<optimized out>)
    at /home/hwtb/zephyrproject/zephyr/drivers/spi/spi_ll_stm32.c:660
#5  spi_stm32_transceive (dev=<optimized out>, config=<optimized out>, tx_bufs=<optimized out>,
    rx_bufs=<optimized out>) at /home/hwtb/zephyrproject/zephyr/drivers/spi/spi_ll_stm32.c:831
#6  0x080037b4 in z_impl_spi_transceive (config=0x8008530 <dev_config+4>, rx_bufs=0x0,
    tx_bufs=0x200013c0 <z_main_stack+960>, dev=<optimized out>)
    at /home/hwtb/zephyrproject/zephyr/include/zephyr/drivers/spi.h:542
#7  spi_transceive (config=0x8008530 <dev_config+4>, rx_bufs=0x0, tx_bufs=0x200013c0 <z_main_stack+960>,
    dev=<optimized out>)
    at /home/hwtb/projects/lora_bridge2/build/zephyr/include/generated/syscalls/spi.h:36
#8  spi_write (config=0x8008530 <dev_config+4>, tx_bufs=0x200013c0 <z_main_stack+960>,
    dev=<optimized out>) at /home/hwtb/zephyrproject/zephyr/include/zephyr/drivers/spi.h:628
#9  spi_write_dt (spec=0x800852c <dev_config>, tx_bufs=0x200013c0 <z_main_stack+960>)
    at /home/hwtb/zephyrproject/zephyr/include/zephyr/drivers/spi.h:646
#10 SX126xWakeup () at /home/hwtb/zephyrproject/zephyr/drivers/lora/sx126x.c:391
#11 0x08003a82 in SX126xInit (dioIrq=dioIrq@entry=0x8003e09 <RadioOnDioIrq>)
    at /home/hwtb/zephyrproject/modules/lib/loramac-node/src/radio/sx126x/sx126x.c:119
#12 0x08004624 in RadioInit (events=<optimized out>)
    at /home/hwtb/zephyrproject/modules/lib/loramac-node/src/radio/sx126x/radio.c:524
#13 0x080034e2 in sx12xx_init (dev=dev@entry=0x8008054 <__device_dts_ord_174>)
    at /home/hwtb/zephyrproject/zephyr/drivers/lora/sx12xx_common.c:352
#14 0x0800360c in sx126x_lora_init (dev=0x8008054 <__device_dts_ord_174>)
    at /home/hwtb/zephyrproject/zephyr/drivers/lora/sx126x.c:457
#15 0x08004fa4 in z_sys_init_run_level (level=level@entry=2)
    at /home/hwtb/zephyrproject/zephyr/kernel/device.c:70
#16 0x08005102 in bg_thread_main (unused1=<optimized out>, unused2=<optimized out>,
    unused3=<optimized out>) at /home/hwtb/zephyrproject/zephyr/kernel/init.c:213
#17 0x0800644c in z_thread_entry (entry=0x80050f5 <bg_thread_main>, p1=<optimized out>,
    p2=<optimized out>, p3=<optimized out>) at /home/hwtb/zephyrproject/zephyr/lib/os/thread_entry.c:36
#18 0x08001c48 in arch_switch_to_main_thread (main_thread=<optimized out>,
    stack_ptr=0x20001440 <z_idle_stacks> "`", _main=0x80050f5 <bg_thread_main>)
    at /home/hwtb/zephyrproject/zephyr/arch/arm/core/aarch32/thread.c:576
#19 0x00000000 in ?? ()
(gdb)

Environment (please complete the following information):

erwango commented 2 years ago

@str4t0m Would you have time to have a look ? (as I think you ported the board)

ballardr commented 2 years ago

I think I may be getting the same issue but with a nordic nrf52840 connected to a SX1276 mbed shield (SX1276MB1LAS). It was working on Zephyr 2.7.99 but I updated last night and I'm now failing at boot. When I used print statements to back track the problem, I'm failing at Radio.Init(&dev_data.events); (line 354) in ${zephyr}/drivers/lora/sx12xx_common.c. Following that, Its failing in RxChainCalibration() in loramac-node/src/radio/sx1276/sx1276.c during the while loop that is reading the SPI bus

    while( ( SX1276Read( REG_IMAGECAL ) & RF_IMAGECAL_IMAGECAL_RUNNING ) == RF_IMAGECAL_IMAGECAL_RUNNING )
    {
    }

So may be worth seeing if your failing in similar spots (I see the STM32WL uses a SX126x internally - ref https://www.st.com/en/microcontrollers-microprocessors/stm32wl-series.html).

I'll keep digging but thought may be useful to mention that it may be a bigger issue

aurel32 commented 2 years ago

So may be worth seeing if your failing in similar spots (I see the STM32WL uses a SX126x internally - ref https://www.st.com/en/microcontrollers-microprocessors/stm32wl-series.html).

I have just tested that a Lora E5 module (based on the STM32WL) still works with the current master for sending data.

HelloWorldTeraByte commented 2 years ago

@aurel32 That is good to hear. Did you use the lora_e5_dev_board board files?. If not do you mind sharing the .dts files used?. Also was it with the sample application under samples/drivers/lora/send?

I will try upgrading to the latest master branch later today and see if it fixes the issue.

aurel32 commented 2 years ago

@aurel32 That is good to hear. Did you use the lora_e5_dev_board board files?. If not do you mind sharing the .dts files used?. Also was it with the sample application under samples/drivers/lora/send?

No that was initially my own application with my board device tree. That said, I have just tried running the samples/drivers/lora/send sample, and it works just fine for me (but I am running the latest master). I do not get any console output given my board is not wired for that, but I correctly receive the messages sent by the sample on the receiver.

HelloWorldTeraByte commented 2 years ago

I tried upgrading to the latest master branch and the problem still persists. It is still stuck in the same place as the original bug report.

I have a feeling the problem could be with the clock setup.

@aurel32 Do you mind either testing the sample application with the lora_e5_dev_board? or posting the board files?. I believe the board config is the only difference between your working setup and mine.

aurel32 commented 2 years ago

@aurel32 Do you mind either testing the sample application with the lora_e5_dev_board? or posting the board files?. I believe the board config is the only difference between your working setup and mine.

My last test was with the lora_e5_dev_board. The only change I made to the sample was the radio frequency to match the one of my receiver. Note that I uses the SDK version 0.14.2. I don't know if it makes any difference.

ballardr commented 2 years ago

Hi @aurel32, have you got your code working yet?

For me, I fixed my first bug (which was device tree related - error were misleading), however, now I'm joining (according to the LoRaWAN server) but Zephyr's lorawan_join() is failing and crashing. Seems to be crashing at line 316 - k_sem_take(&mlme_confirm_sem, K_FOREVER); in subsys/lorawan/lorawan.c. I put print statements above and below to check. Funnily enough however, even if I change the wait to K_SECONDS(15), I still fail/crash at this line.

No idea if the above will help you or not, but thought I might update you incase it's the same issue.

aurel32 commented 2 years ago

Hi @ballardr. I guess you mean @HelloWorldTeraByte instead. My code has been working fine for a long time, although it is limited to LoRa and not LoRaWAN.

HelloWorldTeraByte commented 2 years ago

Hi @aurel32, have you got your code working yet?

For me, I fixed my first bug (which was device tree related - error were misleading), however, now I'm joining (according to the LoRaWAN server) but Zephyr's lorawan_join() is failing and crashing. Seems to be crashing at line 316 - k_sem_take(&mlme_confirm_sem, K_FOREVER); in subsys/lorawan/lorawan.c. I put print statements above and below to check. Funnily enough however, even if I change the wait to K_SECONDS(15), I still fail/crash at this line.

No idea if the above will help you or not, but thought I might update you incase it's the same issue.

I unfortunately have not gotten the LoRa module to work yet.

I attached a gdb server and probed around the registers of interest. Interestingly none of the SPI registers responsible for the LoRa are resetting to the values given in the reference manual. They are all zero even after manually enabling the SPI with the RCC register. Zephyr gets stuck waiting for the TXE flag to be set, but the entire register is 0s but it should have reset to 0x2 after a clock enable.

I am starting to think the LoRa module I have is broken and it is not an issue with Zephyr. More specifically only the LoRa radio section of the SoC is broken as everything else works.

I also believe I might have broken the SoC while I was trying to unlock the chip (removing the RDP). My LoRa module came from the factory with both read out protection (RDP) and write protection. But the Seeed wiki only mentions removing the RDP.

Has anyone had both RDP and write protection enabled off of the factory?. Does anyone know the proper method to removing both the RDP and the write protection without damaging the SoC?

str4t0m commented 2 years ago

I am starting to think the LoRa module I have is broken and it is not an issue with Zephyr. More specifically only the LoRa radio section of the SoC is broken as everything else works.

I don't think your module is broken, nor that it is an issue with zephyr. From your description it looks like you just have not set SUBGHSPISD in the option bytes to disable SPI3 security. That option does not bring any benefit for the single core variants, other than having the same set of option bytes available.But will still block communication with the peripheral on CPU1(the CM4).

HelloWorldTeraByte commented 2 years ago

Interestingly enough I cannot even see the menu for Security Configuration Option bytes in the STM32CubeProgramer GUI. image

Or the STM32CubeProgrammer CLI.

      -------------------------------------------------------------------
                        STM32CubeProgrammer v2.11.0
      -------------------------------------------------------------------

ST-LINK SN  : 0026001C5553500F20393256
ST-LINK FW  : V3J7M2
Board       : STLINK-V3MINI
Voltage     : 3.29V
SWD freq    : 8000 KHz
Connect mode: Normal
Reset mode  : Software reset
Device ID   : 0x497
Revision ID : Rev Z
Device name : STM32WLxx
Flash size  : 256 KBytes
Device type : MCU
Device CPU  : Cortex-M4
BL Version  : 0xc3

UPLOADING OPTION BYTES DATA ...

  Bank          : 0x00
  Address       : 0x58004020
  Size          : 96 Bytes

[==================================================] 100%

  Bank          : 0x01
  Address       : 0x58004080
  Size          : 8 Bytes

[==================================================] 100%

OPTION BYTES BANK: 0

   Read Out Protection:

     RDP          : 0xAA (Level 0, no protection)

   BOR Level:

     BOR_LEV      : 0x0 (BOR Level 0 reset level threshold is around 1.7 V)

   User Configuration:

     nBOOT0       : 0x1 (nBOOT0=1)
     nBOOT1       : 0x1 ()
     nSWBOOT0     : 0x1 (BOOT0 taken from PH3/BOOT0 pin)
     SRAM_RST     : 0x1 (SRAM1 and SRAM2 are not erased when a system reset occurs)
     SRAM2_PE     : 0x1 (SRAM2 parity check disable)
     nRST_STOP    : 0x0 (Reset generated when entering the Stop mode)
     nRST_STDBY   : 0x0 (Reset generated when entering the Standby mode)
     nRST_SHDW    : 0x0 (Reset generated when entering the Shutdown mode)
     WWDG_SW      : 0x1 (Software window watchdog)
     IWGD_STDBY   : 0x1 (Independent watchdog counter running in Standby mode)
     IWDG_STOP    : 0x1 (Independent watchdog counter running in Stop mode)
     IWDG_SW      : 0x1 (Software independent watchdog)
     BOOT_LOCK    : 0x0 (CPU1 CM4 Boot lock disabled)

   Security Configuration Option bytes ESE:

     ESE          : 0x0 (Security disabled)

   PCROP Protection:

     PCROP1A_STRT : 0xFF  (0x803FC00)
     PCROP1A_END  : 0x0  (0x8000000)
     PCROP_RDP    : 0x0 (PCROP zone is kept when RDP is decreased)
     PCROP1B_STRT : 0xFF  (0x803FC00)
     PCROP1B_END  : 0x0  (0x8000000)

   Write Protection:

     WRP1A_STRT   : 0xF  (0x8007800)
     WRP1A_END    : 0x0  (0x8000000)
     WRP1B_STRT   : 0xF  (0x8007800)
     WRP1B_END    : 0x0  (0x8000000)
str4t0m commented 2 years ago

I see ST does no longer display these option bytes for the WLE5 chips in the cube programmer, so they expect these security options to be disabled and your problem may be caused by something else. But you could still give it a try and read/write these either using an old cube programmer version(2.8) or any other preferred tool.

erwango commented 2 years ago

@HelloWorldTeraByte Did you manage to make some progress ?

HelloWorldTeraByte commented 2 years ago

@HelloWorldTeraByte Did you manage to make some progress ?

I unfortunately have not. However, I am convinced that it is not an issue with Zephyr. It is most likely the SoC is damaged and it was most likely damaged when I was flashing the option bytes to unlock the chip. I have contacted Seeed for this issue but have not received any reply. Actually, would some mind sharing the procedure they used to unlock the SoC. My replacement boards have arrived and I will try again with the new boards.

erwango commented 2 years ago

@HelloWorldTeraByte Thanks for the update. I'm closing the point then.

hubert-tt commented 1 year ago

Hey, we've encountered a similar problem when using Lora-E5 Dev Kit - device was stuck on SPI initialization not only when we was using LoRa, but even when we add CONFIG_SPI=y in prj.conf file in blank project. Solution we've come up with was kinda described on ST Community, here. Seting SUBGHSPISD register to 1, e.g. by using STM32CubeProgrammer helps, now LoRa sent example from Zephyr is working fine :)