zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.93k stars 6.65k forks source link

ZBUS subscribers only receive the first message with CONFIG_ASSERT=n #77536

Open jpowen898 opened 3 months ago

jpowen898 commented 3 months ago

Describe the bug When Subscribing to a message using ZBUS if you don't have CONFIG_ASSERT turned on then you only get the first message published to the channel.

To Reproduce Here is a hello world example application I was using to test this. This test code sets up 1 listener and 1 subscriber and the test message gets published from the main loop once a second.

#include <ThriveComs.h>
#include <zephyr/kernel.h>
#include <zephyr/zbus/zbus.h>

struct hello_msg{
    char str[32];
    int i;
};

ZBUS_CHAN_DEFINE(hello_world_chan, struct hello_msg, NULL, NULL, ZBUS_OBSERVERS_EMPTY, {});
ZBUS_MSG_SUBSCRIBER_DEFINE(hello_world_sub);
void sub_thread(){
    const struct zbus_channel *chan;
    struct hello_msg msg;

    while(1){
        zbus_sub_wait_msg(&hello_world_sub, &chan, &msg, K_FOREVER);
        printk("SUB: %s %d\n", msg.str, msg.i);

    }
}
K_THREAD_DEFINE(hello_world_sub_thread, 1024, sub_thread, NULL, NULL, NULL, 3, 0, 0);

static void hello_listener(const struct zbus_channel *chan)
{
    const struct hello_msg *msg = (hello_msg *)zbus_chan_const_msg(chan);
    printk("LIS: %s %d\n", msg->str, msg->i);
}
ZBUS_LISTENER_DEFINE(hello_world_lis, hello_listener);

int main(void){
    struct hello_msg msg;
    strcpy(msg.str, "hello world");
    zbus_chan_add_obs(&hello_world_chan, &hello_world_lis, K_MSEC(200));
    zbus_chan_add_obs(&hello_world_chan, &hello_world_sub, K_MSEC(200));

    int i = 0;
    while(1){
        msg.i = ++i;
        printk("\nPUB: %s %d\n", msg.str, msg.i);
        zbus_chan_pub(&hello_world_chan, &msg, K_SECONDS(1));
        k_msleep(1000);
    }
    return 0;
}

As well as the prj.conf file I was using

CONFIG_ZBUS=y
CONFIG_ZBUS_MSG_SUBSCRIBER=y
CONFIG_ZBUS_RUNTIME_OBSERVERS=y
CONFIG_HEAP_MEM_POOL_SIZE=1024
CONFIG_ZBUS_LOG_LEVEL_DBG=y
CONFIG_ZBUS_CHANNEL_NAME=y
CONFIG_ZBUS_MSG_SUBSCRIBER_BUF_ALLOC_DYNAMIC=y

Then I ran the application using the native_sim board. west build -b native_sim -t run

The output I was getting was as follows.

*** Booting Zephyr OS build v3.6.0 ***

PUB: hello world 1
LIS: hello world 1
SUB: hello world 1

PUB: hello world 2
LIS: hello world 2

PUB: hello world 3
LIS: hello world 3

PUB: hello world 4
LIS: hello world 4

As you can see there should be a SUB: displayed for each time the message is published but it only occurs on the first message publish event. I tinkered for a long time before I tried adding CONFIG_ASSERT=y to my prj.conf. After adding this I got the expected output.

*** Booting Zephyr OS build v3.6.0 ***

PUB: hello world 1
LIS: hello world 1
SUB: hello world 1

PUB: hello world 2
LIS: hello world 2
SUB: hello world 2

PUB: hello world 3
LIS: hello world 3
SUB: hello world 3

PUB: hello world 4
LIS: hello world 4
SUB: hello world 4
faxe1008 commented 3 months ago

@Wanderx13 this looks suspicious, what exactly is this installer needed for?

faxe1008 commented 3 months ago

Deleted message which was not issue related and most likely malware: image

rodrigopex commented 2 months ago

@jpowen898, could you please test that with Zephyr 3.7.0?

I saw you are using v3.6.0, which does not have LTS support. I could not reproduce that locally on v3.6.0. I am on a Mac and cannot run the native_sim. On the Qemu, everything seems to be okay even when disabling CONFIG_ASSERT=n.

I have noticed an issue in the code you pasted here regarding the listener code, which is very unlikely related to the issue. Is this the actual version you are running?

-   const struct hello_msg *msg = (hello_msg *)zbus_chan_const_msg(chan);
+   const struct hello_msg *msg = (struct hello_msg *)zbus_chan_const_msg(chan);
jpowen898 commented 2 months ago

@rodrigopex I just updated to v3.7.0 and tested on qemu_x86 and did not see the issue. However, I also tested it on qemu_cortex_m0, qemu_cortex_m3, qemu_cortex_a53, qemu_cortex_a9, qemu_cortex_r5, and native_sim and they did still have the issue. Also it looked like on a few of those targets the k_msleep() function wasn't working.

Here is the updated code and prj.conf.

#include <ThriveComs.h>
#include <zephyr/kernel.h>
#include <zephyr/zbus/zbus.h>

struct hello_msg{
    char str[32];
    int i;
};

ZBUS_CHAN_DEFINE(hello_world_chan, struct hello_msg, NULL, NULL, ZBUS_OBSERVERS_EMPTY, {});
ZBUS_MSG_SUBSCRIBER_DEFINE(hello_world_sub);
void sub_thread(){
    const struct zbus_channel *chan;
    struct hello_msg msg;

    while(1){
        zbus_sub_wait_msg(&hello_world_sub, &chan, &msg, K_FOREVER);
        printk("SUB: %s %d\n", msg.str, msg.i);

    }
}
K_THREAD_DEFINE(hello_world_sub_thread, 1024, sub_thread, NULL, NULL, NULL, 3, 0, 0);

static void hello_listener(const struct zbus_channel *chan)
{
    const struct hello_msg *msg = (hello_msg *)zbus_chan_const_msg(chan);
    printk("LIS: %s %d\n", msg->str, msg->i);
}
ZBUS_LISTENER_DEFINE(hello_world_lis, hello_listener);

int main(void){
    struct hello_msg msg;
    strcpy(msg.str, "hello world");
    zbus_chan_add_obs(&hello_world_chan, &hello_world_lis, K_MSEC(200));
    zbus_chan_add_obs(&hello_world_chan, &hello_world_sub, K_MSEC(200));

    int i = 0;
    while(1){
        msg.i = ++i;
        printk("\nPUB: %s %d\n", msg.str, msg.i);
        zbus_chan_pub(&hello_world_chan, &msg, K_SECONDS(1));
        k_msleep(1000);
    }
    return 0;
}
CONFIG_ASSERT=n
CONFIG_ZBUS=y
CONFIG_ZBUS_MSG_SUBSCRIBER=y
CONFIG_ZBUS_RUNTIME_OBSERVERS=y
CONFIG_ZBUS_PRIORITY_BOOST=n
CONFIG_HEAP_MEM_POOL_SIZE=1024
CONFIG_ZBUS_LOG_LEVEL_DBG=y
CONFIG_ZBUS_CHANNEL_NAME=y
CONFIG_ZBUS_MSG_SUBSCRIBER_BUF_ALLOC_DYNAMIC=y
rodrigopex commented 2 months ago

@jpowen898, Could you tell me if the code here is exactly what you are running? There is an issue with the hello_msg on line 27 when I disable the line 1. I don't have your ThriveComs.h.

rodrigopex commented 2 months ago

@jpowen898

This is still running for the architectures you mentioned.

For instance, the qemu_cortex_m0 stdout:

❯ west build -t run
-- west build: running target run
[0/1] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-m0
*** Booting Zephyr OS build v3.7.0-1-ga50ef5b7eb55 ***

PUB: hello world 1
LIS: hello world 1
SUB: hello world 1

PUB: hello world 2
LIS: hello world 2
SUB: hello world 2

PUB: hello world 3
LIS: hello world 3
SUB: hello world 3

PUB: hello world 4
LIS: hello world 4
SUB: hello world 4

PUB: hello world 5
LIS: hello world 5
SUB: hello world 5

PUB: hello world 6
LIS: hello world 6
SUB: hello world 6

PUB: hello world 7
LIS: hello world 7
SUB: hello world 7
qemu-system-arm: terminating on signal 2 from pid 1310 (<unknown process>)
ninja: build stopped: interrupted by user.

Also it looked like on a few of those targets the k_msleep() function wasn't working.

You need to disable the CONFIG_QEMU_ICOUNT to make the Qemu slower.

For instance, I am using this command to compile:

west build -p -b qemu_riscv32 samples/zbus_issue_msub -- -DCONFIG_QEMU_ICOUNT=n

What SDK version are you using? Have you run west update when switching to v3.7.0?

The code I am using is here in this link: https://github.com/rodrigopex/zephyr/tree/zbus_issue_77536/samples/zbus_issue_msub

jpowen898 commented 2 months ago

Sorry I was testing some other code there, but you can remove the #include <ThriveComs.h>. Thanks for the tip about the CONFIG_QEMU_ICOUNT that fixed the sleep issue.

I was on sdk version 0.16.5-1, but I just updated to sdk 0.16.8. And yes I have run west update since updating to zephyr 3.7.0 and I ran it again after updating the sdk. I am still seeing the issue when I disable CONFIG_ASSERT maybe it is just an issue when running on linux. I am running all of this in an ubuntu 22.04 docker container.

Here is a more extensive console output if that helps.

thrive ~ $ west build zbus_hello_world -b qemu_cortex_m0 -t run -p 
-- west build: making build dir /home/thrive/build/qemu_cortex_m0/zbus_hello_world pristine
-- west build: generating a build system
Loading Zephyr default modules (Zephyr base).
-- Application: /home/thrive/zbus_hello_world
-- CMake version: 3.22.1
-- Found Python3: /usr/bin/python3 (found suitable version "3.10.12", minimum required is "3.8") found components: Interpreter 
-- Cache files will be written to: /home/thrive/.cache/zephyr
-- Zephyr version: 3.7.0 (/home/thrive/zephyr)
-- Found west (found suitable version "1.2.0", minimum required is "0.14.0")
-- Board: qemu_cortex_m0, qualifiers: nrf51822
-- ZEPHYR_TOOLCHAIN_VARIANT not set, trying to locate Zephyr SDK
-- Found host-tools: zephyr 0.16.8 (/home/thrive/zephyr-sdk-0.16.8)
-- Found toolchain: zephyr 0.16.8 (/home/thrive/zephyr-sdk-0.16.8)
-- Found Dtc: /home/thrive/zephyr-sdk-0.16.8/sysroots/x86_64-pokysdk-linux/usr/bin/dtc (found suitable version "1.6.0", minimum required is "1.4.6") 
-- Found BOARD.dts: /home/thrive/zephyr/boards/qemu/cortex_m0/qemu_cortex_m0.dts
-- Generated zephyr.dts: /home/thrive/build/qemu_cortex_m0/zbus_hello_world/zephyr/zephyr.dts
-- Generated devicetree_generated.h: /home/thrive/build/qemu_cortex_m0/zbus_hello_world/zephyr/include/generated/zephyr/devicetree_generated.h
-- Including generated dts.cmake file: /home/thrive/build/qemu_cortex_m0/zbus_hello_world/zephyr/dts.cmake

warning: QEMU_ICOUNT_SHIFT (defined at boards/Kconfig:73) was assigned the value '6' but got the
value ''. Check these unsatisfied dependencies: QEMU_ICOUNT (=n). See
http://docs.zephyrproject.org/latest/kconfig.html#CONFIG_QEMU_ICOUNT_SHIFT and/or look up
QEMU_ICOUNT_SHIFT in the menuconfig/guiconfig interface. The Application Development Primer, Setting
Configuration Values, and Kconfig - Tips and Best Practices sections of the manual might be helpful
too.

warning: The choice symbol ZBUS_LOG_LEVEL_DBG (defined at
subsys/logging/Kconfig.template.log_config:20) was selected (set =y), but no symbol ended up as the
choice selection. See http://docs.zephyrproject.org/latest/kconfig.html#CONFIG_ZBUS_LOG_LEVEL_DBG
and/or look up ZBUS_LOG_LEVEL_DBG in the menuconfig/guiconfig interface. The Application Development
Primer, Setting Configuration Values, and Kconfig - Tips and Best Practices sections of the manual
might be helpful too.

Parsing /home/thrive/zephyr/Kconfig
Loaded configuration '/home/thrive/zephyr/boards/qemu/cortex_m0/qemu_cortex_m0_defconfig'
Merged configuration '/home/thrive/zbus_hello_world/prj.conf'
Configuration saved to '/home/thrive/build/qemu_cortex_m0/zbus_hello_world/zephyr/.config'
Kconfig header saved to '/home/thrive/build/qemu_cortex_m0/zbus_hello_world/zephyr/include/generated/zephyr/autoconf.h'
-- Found GnuLd: /home/thrive/zephyr-sdk-0.16.8/arm-zephyr-eabi/arm-zephyr-eabi/bin/ld.bfd (found version "2.38") 
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- The ASM compiler identification is GNU
-- Found assembler: /home/thrive/zephyr-sdk-0.16.8/arm-zephyr-eabi/bin/arm-zephyr-eabi-gcc
-- Configuring done
-- Generating done
-- Build files have been written to: /home/thrive/build/qemu_cortex_m0/zbus_hello_world
-- west build: running target run
[1/135] Preparing syscall dependency handling

[2/135] Generating include/generated/zephyr/version.h
-- Zephyr version: 3.7.0 (/home/thrive/zephyr), build: v3.7.0
[134/135] Linking CXX executable zephyr/zephyr.elf
Memory region         Used Size  Region Size  %age Used
           FLASH:       18900 B       256 KB      7.21%
             RAM:        7960 B        16 KB     48.58%
        IDT_LIST:          0 GB        32 KB      0.00%
Generating files from /home/thrive/build/qemu_cortex_m0/zbus_hello_world/zephyr/zephyr.elf for board: qemu_cortex_m0
[134/135] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-m0
*** Booting Zephyr OS build v3.7.0 ***

PUB: hello world 1
LIS: hello world 1
SUB: hello world 1

PUB: hello world 2
LIS: hello world 2

PUB: hello world 3
LIS: hello world 3

PUB: hello world 4
LIS: hello world 4

PUB: hello world 5
LIS: hello world 5

PUB: hello world 6
LIS: hello world 6
qemu-system-arm: terminating on signal 2
ninja: build stopped: interrupted by user.
github-actions[bot] commented 3 weeks ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.