zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.51k stars 6.44k forks source link

WiFi Initialization Causes Assert in shell_vfprintf() due to recent change which reintroduced #10617 #68793

Closed EricNRS closed 6 months ago

EricNRS commented 7 months ago

Describe the bug Seen on ESP32-S3 target, but this will happen on any target with WiFi that has CONFIG_NET_L2_WIFI_SHELL and CONFIG_ASSERT enabled and a network connection manager event arrives.

Change https://github.com/zephyrproject-rtos/zephyr/commit/055ac615427fd55fa347cd1744280734e730a42a removed a check to see if the sh pointer is NULL and instead calls shell_fprintf() directly resulting in a failed assert in shell_vfprintf() (which was originally reported and fixed as issue https://github.com/zephyrproject-rtos/zephyr/issues/10617).

image

The new PR macros just pass the NULL sh pointer directly to shell_fprintf() as shown below. https://github.com/zephyrproject-rtos/zephyr/blob/40b53d5d774705d27183aaa1e42f2f322250da10/subsys/net/lib/shell/net_shell_private.h#L11-L12

which then triggers the assert on line 1510 . https://github.com/zephyrproject-rtos/zephyr/blob/40b53d5d774705d27183aaa1e42f2f322250da10/subsys/shell/shell.c#L1507-L1517

To Reproduce The issue can be reproduced using the samples/net/wifi app, but you need to add code to automatically connect to the network, otherwise when you use "wifi connect ..." it will initialise the shell pointer.

west build -b esp32s3_luatos_core_usb samples/net/wifi

EricNRS commented 7 months ago

@ndrs-pst you are probably in the best position to know the best way to fix this since you have been working on the code cleanup.

ndrs-pst commented 7 months ago

@EricNRS, thank you for tagging me. Yes, I am aware of such macros, and if you look at this PR https://github.com/zephyrproject-rtos/zephyr/pull/68185, you'll see that I also asked the maintainer. Now, I've found that this issue has been occurring since 2018.

Regarding whether to just bring back the previous fix, I'm not sure what the better option is to tackle this. Ping @jukkar, @krish2718. 👀

EricNRS commented 7 months ago

@ndrs-pst - it may be better to update the PR() macros to add the check for sh == NULL similar to the old print() macro that was just in wifi_shell.c. Should be a quick and easy fix.

ndrs-pst commented 7 months ago

Another way that comes to my mind would be to limit the shell backend to just SERIAL and RTT, and to obtain the shell backend pointer during initialization.

In this way, we can guarantee the shell's initialization and cover most use cases, which normally involve just SERIAL/RTT.

#include <strings.h>
#include <zephyr/shell/shell.h>
#include <zephyr/shell/shell_uart.h>
#include <zephyr/shell/shell_rtt.h>
#include <zephyr/sys/printk.h>
#include <zephyr/init.h>
static int wifi_shell_init(void)
{

#if IS_ENABLED(CONFIG_SHELL_BACKEND_SERIAL)
    context.sh = shell_backend_uart_get_ptr();
#elif IS_ENABLED(CONFIG_LOG_BACKEND_RTT)
    context.sh = shell_backend_rtt_get_ptr();
#else
    BUILD_ASSERT(0, "Unsupported shell backend");
#endif
    context.all = 0U;
    context.scan_result = 0U;

    net_mgmt_init_event_callback(&wifi_shell_mgmt_cb,
                     wifi_mgmt_event_handler,
                     WIFI_SHELL_MGMT_EVENTS);

    net_mgmt_add_event_callback(&wifi_shell_mgmt_cb);

    return 0;
}
jukkar commented 7 months ago

It may be better to update the PR() macros to add the check for sh == NULL similar to the old print() macro that was just in wifi_shell.c. Should be a quick and easy fix.

I agree, this is the most simplest fix for this.

ndrs-pst commented 7 months ago

It may be better to update the PR() macros to add the check for sh == NULL similar to the old print() macro that was just in wifi_shell.c. Should be a quick and easy fix.

I agree, this is the most simplest fix for this.

Since the PR() macros were borrowed from "net_shell_private.h", I would prefer not to update them and add unnecessary NULL checks to other modules. My proposal would be to bring back print(sh, ...) inside wifi_shell.c and have a mix of both PR and print macros.

jukkar commented 7 months ago

My proposal would be to bring back print(sh, ...) inside wifi_shell.c and have a mix of both PR and print macros.

No, please do not mix yet another PR/print macros. There should be no harm if we add null checks to PR macros.

EricNRS commented 7 months ago

Most compilers are very good at optimising out unnecessary checks, soextra checks should not result in any substantial code bloat.

ndrs-pst commented 7 months ago

@jukkar, then which option should we choose? For PR, PR_SHELL, ... PR_WARNING macros modifications in "net_shell_private.h"

/* OPTION 1 */
#define PR(fmt, ...)                    \
    do {                        \
        if (sh) {               \
            shell_fprintf(sh, SHELL_NORMAL, fmt, ##__VA_ARGS__); \
        }                   \
    } while (false)
/* OPTION 2 */
#define PR(fmt, ...)                    \
    do {                        \
        if (sh) {               \
            shell_fprintf(sh, SHELL_NORMAL, fmt, ##__VA_ARGS__); \
        }                   \
        else {                  \
            printk(fmt, ##__VA_ARGS__); \
        }                   \
    } while (false)
EricNRS commented 7 months ago

Option 1 Option 2 is required for the WiFi "connected" and "disconnected" notifications.

Updated: should be option 2 for WiFi

jukkar commented 7 months ago

Lets start with option 1. for now.

EricNRS commented 7 months ago

Lets start with option 1. for now.

Sorry, I updated the comment, should be option 2 is required to print the WiFi status.