zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.14k stars 6.23k forks source link

thread/flash: can't erase flash memory when using k_msgq_get() #66626

Open ClovisCorde opened 7 months ago

ClovisCorde commented 7 months ago

Hello guys :smile:

Describe the bug I'm trying to erase the flash memory from a thread (with priority 1). But when in another thread (priority 7) I have a k_msgq_get with a timeout of K_MSEC(100) I have a -5 error when trying to erase. However, when I remove the k_msgq_get from my thread and put a k_sleep(K_MSEC(100)) it works great. (see the detailed code below).

I am working on a STM32 nucleoG0B1RE. The erase function is called from a command shell.

To Reproduce

This is my full source code :

/*
 * Copyright (c) 2021 Nordic Semiconductor ASA
 * SPDX-License-Identifier: Apache-2.0
 */

#include <zephyr/kernel.h>

#include <zephyr/shell/shell.h>
#include <zephyr/storage/flash_map.h>

#include "lora_protocol.h"

#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(main, CONFIG_APP_LOG_LEVEL);

//--------------------//
//  Thread LORA  //
//--------------------//
#define LORA_PROTOCOL_THREAD_STACK_SIZE 4096
#define LORA_PROTOCOL_THREAD_PRIORITY 7

/**
 * @brief lora_protocol message queue
 */
static struct k_msgq lora_protocol_messageq;

/**
 * @brief application thread
 *
 */
void lora_protocol_thread(void);

K_THREAD_DEFINE(lora_protocol_tid, LORA_PROTOCOL_THREAD_STACK_SIZE,
                lora_protocol_thread, NULL, NULL, NULL,
                LORA_PROTOCOL_THREAD_PRIORITY, 0, 0);

void lora_protocol_thread(void) {
  struct lora_msg_t msg;
  int ret;

  while (1) {
    /* check if a message has been put to queue */
    ret = k_msgq_get(&lora_protocol_messageq, &msg, K_MSEC(100));
    LOG_INF("lora_protocol_thread, ret = %d", ret);
  }
}

//--------------------//
//  Thread SHELL  //
//--------------------//
#define DELTA_SHELL_THREAD_STACK_SIZE 4096
#define DELTA_SHELL_THREAD_PRIORITY 1

static struct k_work work_delta_test;
static struct k_work work_delta_erase;

struct k_work_q wq_delta;
static K_THREAD_STACK_DEFINE(wq_delta_stack, CONFIG_MAIN_STACK_SIZE);

/**
 * @brief application thread
 *
 */
static void init_wq(void);

static void delta_test_cb(struct k_work *item);
static void delta_erase_cb(struct k_work *item);

K_THREAD_DEFINE(delta_shell, DELTA_SHELL_THREAD_STACK_SIZE, init_wq, NULL, NULL,
                NULL, DELTA_SHELL_THREAD_PRIORITY, 0, 0);

static void init_wq(void) {
  k_work_queue_start(&wq_delta, wq_delta_stack,
                     K_THREAD_STACK_SIZEOF(wq_delta_stack),
                     DELTA_SHELL_THREAD_PRIORITY, NULL);
  k_work_init(&work_delta_test, delta_test_cb);
  k_work_init(&work_delta_erase, delta_erase_cb);
  LOG_INF("Workqueue initialized");
}

static void delta_test_cb(struct k_work *item) {
  ARG_UNUSED(item);

  LOG_INF("Delta test callback");
}

static void delta_erase_cb(struct k_work *item) {
  ARG_UNUSED(item);

  LOG_INF("Delta erase callback");
  int ret;
  const struct flash_area *flash_area;

  ret = flash_area_open(FIXED_PARTITION_ID(slot1_partition), &flash_area);
  if (ret != 0) {
    LOG_ERR("Can not open the flash area for %d",
            FIXED_PARTITION_ID(slot1_partition));
    // return -ENODEV;
  }

  ret = flash_area_erase(flash_area, 0, FIXED_PARTITION_SIZE(slot1_partition));
  if (ret != 0) {
    LOG_ERR("Can not erase the flash area for slot1 partition, ret = %d\n",
            ret);
    // return -EINVAL;
  }
  flash_area_close(flash_area);
}

//--------------------//
//  DEFINE SHELL  //
//--------------------//
static int handle_test(const struct shell *shell, size_t argc, char **argv) {
  ARG_UNUSED(argc);
  k_work_submit_to_queue(&wq_delta, &work_delta_test);
  return 0;
}

static int handle_erase(const struct shell *shell, size_t argc, char **argv) {
  ARG_UNUSED(argc);
  k_work_submit_to_queue(&wq_delta, &work_delta_erase);
  return 0;
}

/* Define all the shell commands */
SHELL_STATIC_SUBCMD_SET_CREATE(
    delta_transfer,
    SHELL_CMD(test, NULL, "Test the shell is working", handle_test),
    SHELL_CMD(erase, NULL, "Erase the slot 1 partition", handle_erase),
    SHELL_SUBCMD_SET_END);

SHELL_CMD_REGISTER(delta_shell, &delta_transfer, "Delta transfer commands",
                   NULL);

int main(void) {
  LOG_INF("Hello world!");

  return 0;
}

This is my prj.conf :

# Copyright (c) 2021 Nordic Semiconductor ASA
# SPDX-License-Identifier: Apache-2.0
#
# This file contains selected Kconfig options for the application.

# Enable reboot
CONFIG_REBOOT=y

CONFIG_FLASH=y
CONFIG_FLASH_MAP=y

# Logging
CONFIG_LOG=y
CONFIG_APP_LOG_LEVEL_DBG=y
CONFIG_CONSOLE=y
CONFIG_RTT_CONSOLE=y
CONFIG_USE_SEGGER_RTT=y
CONFIG_LOG_BACKEND_RTT=y
# Use RTT buffer 2 for log, keep 0 for shell
CONFIG_LOG_BACKEND_RTT_BUFFER=1
CONFIG_LOG_BACKEND_RTT_BUFFER_SIZE=2048
CONFIG_SEGGER_RTT_BUFFER_SIZE_DOWN=128
CONFIG_SHELL_BACKEND_RTT=y
CONFIG_SHELL_LOG_BACKEND=n

# Shell
CONFIG_SHELL=y

My error happens when I have this code in the thread :

void lora_protocol_thread(void) {
  struct lora_msg_t msg;
  int ret;

  while (1) {
    /* check if a message has been put to queue */
    ret = k_msgq_get(&lora_protocol_messageq, &msg, K_MSEC(100));
    LOG_INF("lora_protocol_thread, ret = %d", ret);
  }
}

I have then in logs :

*** Booting Zephyr OS build zephyr-v3.5.0-3261-gecce23532270 ***
[00:00:00.002,000] <inf> main: Hello world!
[00:00:00.002,000] <inf> main: Workqueue initialized
[00:00:00.102,000] <inf> main: lora_protocol_thread, ret = -11
...
[00:00:18.821,000] <inf> main: lora_protocol_thread, ret = -11
(we activate the erase command shell)
[00:00:18.851,000] <inf> main: Delta erase callback
[00:00:18.921,000] <err> main: Can not erase the flash area for slot1 partition, ret = -5
[00:00:18.921,000] <inf> main: lora_protocol_thread, ret = -11

But it works the moment I remove the k_msgq_get :

void lora_protocol_thread(void) {
  struct lora_msg_t msg;
  int ret;

  while (1) {
    /* check if a message has been put to queue */
    // ret = k_msgq_get(&lora_protocol_messageq, &msg, K_MSEC(100));
    LOG_INF("lora_protocol_thread, ret = %d", ret);
    k_sleep(K_MSEC(100));
  }
}

And this is the logs (and the flash is successfully erased) :

*** Booting Zephyr OS build zephyr-v3.5.0-3261-gecce23532270 ***
[00:00:00.002,000] <inf> main: Hello world!
[00:00:00.002,000] <inf> main: Workqueue initialized
[00:00:00.102,000] <inf> main: lora_protocol_thread, ret = -11
...
[00:00:16.819,000] <inf> main: lora_protocol_thread, ret = 0
(we activate the erase command shell)
[00:00:16.821,000] <inf> main: Delta erase callback
[00:00:18.980,000] <inf> main: lora_protocol_thread, ret = 0

Expected behavior I want to be able to erase the flash memory by keeping the k_msgq_get function in my thread.

Impact I can't have my both threads working together.

Environment (please complete the following information):

Thanks in advance for your help!

kartben commented 7 months ago

@ClovisCorde Have you tried enabling the debug logs for the stm32 flash driver? There are several possibilities for why it would return -EIO so knowing which one it is could be helpful.

ClovisCorde commented 7 months ago

Hi, thanks for your reply. Is there a config to enable the debug logs for stm32 flash driver? I don't know how to do so I tried to trace my code's execution to have more details on the problem. I'll try to explain the process as clearly as possible.

Here's what's happening in my code: When the function flash_area_erase(flash_area, 0, FIXED_PARTITION_SIZE(slot1_partition)) is called, it triggers the flash_stm32_erase function in the flash_stm32.c file. Then in this function it calls flash_stm32_block_erase_loop from flash_stm32g0x.c which then calls erase_page. The erase_page function then calls flash_stm32_wait_flash_idle from flash_stm32.c file. Within flash_stm32_wait_flash_idle there is a call to flash_stm32_check_status.

Then it's the function flash_stm32_check_status that returns the -EIO error :

static int flash_stm32_check_status(const struct device *dev)
{

    if (FLASH_STM32_REGS(dev)->FLASH_STM32_SR & FLASH_STM32_SR_ERRORS) {
        LOG_DBG("Status: 0x%08lx",
            FLASH_STM32_REGS(dev)->FLASH_STM32_SR &
                            FLASH_STM32_SR_ERRORS);
        /* Clear errors to unblock usage of the flash */
        FLASH_STM32_REGS(dev)->FLASH_STM32_SR = FLASH_STM32_REGS(dev)->FLASH_STM32_SR &
                            FLASH_STM32_SR_ERRORS;
        return -EIO;
    }

    return 0;
}

I hope this information is helpful. If you could advise on how to enable logs for the STM32 flash driver, it might make diagnosing the problem easier.

erwango commented 7 months ago

If you could advise on how to enable logs for the STM32 flash driver, it might make diagnosing the problem easier.

CONFIG_FLASH_LOG_LEVEL_DBG=y

Anyway, the link between k_msgq_get and flash status errors is not obvious. I would be curious to know if problem is specific to this target or could be reproduced more widely.

ClovisCorde commented 7 months ago

Here are the logs. I will try to find another board to reproduce the problem. image

ClovisCorde commented 7 months ago

Update, I tried on nucleo_l4azg and same problem, just the Status is different : 0x00000080 for l4azg and 0x000000a0 for g0b1re image

ClovisCorde commented 5 months ago

Hi guys, any updates on this problem?

erwango commented 4 months ago

Sorry we had no time to have a look recently. We'll try to give it a shot in coming month.

github-actions[bot] commented 2 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

FRASTM commented 1 day ago

Update, I tried on nucleo_l4azg and same problem, just the Status is different : 0x00000080 for l4azg and 0x000000a0 for g0b1re image

Unfortunately, I cannot find an application causing (see the log neither) the error, on stm32g0b1 nucleo. The samples/drivers/flash_shell does not. According to the status error : PGAERR and PGSERR are reported. The sequence for erasing a page of the stm32g0b1 is described in the refman RM0444 3.3.7FLASH Main memory erase sequences