zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.66k stars 6.53k forks source link

Best effort message broadcast system with concise binary object representation #45967

Open jfischer-no opened 2 years ago

jfischer-no commented 2 years ago

Introduction

Implement a best effort message broadcast system with concise binary object representation.

Problem description

Zephyr OS apparently lacks a message broadcast system like DBUS.

Proposed change

Implement a message broadcast system that uses (system) workqueue and distribute messages according to best effort principle. The messages shall be self-describing (CBOR). The message source (and user) should not be forced to define or use any broadcaster specific header files and cumbersome macros. It should be possible for listener to subscribe and unsubscribe to a broadcaster at runtime.

stephanosio commented 2 years ago

FYI @yperess

carlescufi commented 2 years ago

Relevant similar issues and PRs:

MaureenHelm commented 1 year ago

@jfischer-no can this be closed?

jfischer-no commented 1 year ago

@jfischer-no can this be closed?

It is not resolved, and I still think it could be useful.

rodrigopex commented 1 year ago

@jfischer-no Now, you can use a zbus channel with, for example, the following message:

struct cbor_msg {
        uint8_t payload[CAPACITY];
        size_t size;
};

In this way, you can broadcast CBOR data to several threads simultaneously.

mkschreder commented 1 year ago

I think mailbox with callback hooks grouped into dedicated section is the right solution for this. The mailbox already provides all building blocks for proper synchronous and asynchronous message passing and callback can be used as a preliminary filter before sending the message through mailbox.

mkschreder commented 1 year ago

As for CDDL you really should pass it through c preprocessor in order to get all the powerful definition functionality as well as includes. Just like we do with the device tree.

mkschreder commented 1 year ago

Using system work queue for distribution is a bad idea. It means that distribution is locked to work queue priority. Distribution ideally needs to happen from the publisher context so that it can be prioritized by publisher priority. Mailbox already implements this kind of prioritization implicitly because wait queue is implicitly sorted by priority.

rodrigopex commented 1 year ago

@mkschreder, are mailboxes able to send messages in multicast (one-to-many)?

The mailbox documentation says: Each message may be received by only one thread (i.e. point-to-multipoint and broadcast messaging is not supported).

I am suspicious of saying that, but the better solution in that case is using ZBus. Many-to-many is possible with priority context publication with priority inheritance. :grimacing:

mkschreder commented 1 year ago

@mkschreder, are mailboxes able to send messages in multicast (one-to-many)?

The mailbox documentation says: Each message may be received by only one thread (i.e. point-to-multipoint and broadcast messaging is not supported).

I am suspicious of saying that, but the better solution in that case is using ZBus. Many-to-many is possible with priority context publication with priority inheritance. 😬

No. Just like with zbus you use callbacks to deliver the notification (zbus listener approach) but then only do the actual ipc call and context switch from sender to receiver over mailbox in the callback handler code if you are interested in the message (ie you send it to your own thread id in the callback but callback runs in caller thread context). Essentially you get multicast using the same mechanism as zbus - a dedicated elf section with callbacks - but with proper guaranteed interthread delivery where you can wake up a worker with a message and then let it use cpu in bursts while you interleave it with other work.

Now zbus is not a proper data transfer mechanism. Calling it a "bus" is a misnomer and creates almost unlimited confusion in people. Zbus is a shared data store with change notifications. Thus implementing request response over zbus requires you to either use listener or live with the fact that you only get latest update and not a series of updates. Which is fine and is the right behaviour if one understands that it is a shared data store and not an actual message passing mechanism.

Locally, all multicast must use a list of callbacks. The only question is how to deliver the message to a thread and here you have two main delivery mechanisms provided by zephyr: rendezvous without buffering (mailbox) and buffered (message queue).

We could add socketpair to it but then we would be taking things to the next level. Linux moved away from mailboxes and message queues towards unix domain sockets as main ipc mechanism because socket abstraction scales immensely well. However socket abstraction is considerably more complex and is overkill for clean interthread message passing. So for threads you just decode whether you want buffering or not based on whether you want the calls to be rendezvous (synchronous) where caller always waits for receiver to express interest or if you want caller to store message and continue. Buffering introduces the problem that caller doesn't know if receiver has accepted the message before continuing.

With interthread ipc normally you don't need buffering specially in the use case of sendreceive where you want to do a procedure call to another thread and get a response. Buffering in such scenario would require adding request/response matching which mailbox elegantly implements in place without buffering while still allowing other requests to remain queued since it is the requesting thread that is being queued and not the data.

rodrigopex commented 1 year ago

@mkschreder, I see. Take a look at the PR: https://github.com/zephyrproject-rtos/zephyr/pull/62236. There, I am adding what is missing on ZBus. The msg_subscriber will receive a copy of the message during the publication. I hope it will solve your needs.

Calling it a "bus" is a misnomer and creates almost unlimited confusion in people. Zbus is a shared data store with change notifications.

What do you think is a good name for ZBus to avoid "unlimited" confusion in people?

I was trying to find a popular definition online that tells me ZBus is not a bus. I could not find it. Please point out a reference to help me understand a software bus definition.

The basic naive reference Wikipedia tells us: "A software bus is a software architecture model where a shared communication channel facilitates connections and communication between software modules."

Two main ways to exchange data between threads are shared memory and message-passing. Up to now, ZBus has only implemented shared memory. The PR is adding the message-passing mechanism to it. Do you consider ZBus to become a bus after the PR merge?

mkschreder commented 1 year ago

@mkschreder, I see. Take a look at the PR: #62236. There, I am adding what is missing on ZBus. The msg_subscriber will receive a copy of the message during the publication. I hope it will solve your needs.

Calling it a "bus" is a misnomer and creates almost unlimited confusion in people. Zbus is a shared data store with change notifications.

What do you think is a good name for ZBus to avoid "unlimited" confusion in people?

I was trying to find a popular definition online that tells me ZBus is not a bus. I could not find it. Please point out a reference to help me understand a software bus definition.

The basic naive reference Wikipedia tells us: "A software bus is a software architecture model where a shared communication channel facilitates connections and communication between software modules."

Two main ways to exchange data between threads are shared memory and message-passing. Up to now, ZBus has only implemented shared memory. The PR is adding the message-passing mechanism to it. Do you consider ZBus to become a bus after the PR merge?

A good name for zbus would be a "datastore" or "ds" for short. It semantically acts indeed very similar to a datastore. It think the biggest issue with it right now is to make it play well with the rest of the system. For example, when file descriptor based ipc mechanisms are used, it is trivial to sleep on an array of file descriptors until any one of them is ready (k_poll) thus enabling a process that uses any abstraction that builds on file descriptors (network, event, socketpair and whatnot) to sleep in one place waiting on multiple kinds of messages to arrive from many sources.

This file descriptor concept does not however allow "matrix" arrangement without a master server (most rpc/ipc/broker systems need a master to avoid wasting resources - ie for 8 clients talking to each other you only need 8 socket pairs and the master does the routing). The bus matrix abstraction is instead implemented beautifully by the mailbox implementation in zephyr. It allows multiple threads to send messages to each other without a master and without any unnecessary queing. While guaranteeing that a message is not consumed until sender has released control of it (thread is made runable only after the data has been copied).

I think that zbus as it is done today is great as a shared data store with notification support but I think the problem that occurs in large projects attempting to use zbus as a method of communication instead of a data store is that it creates chaos in terms of synchronization specially if listeners are used that always run in caller context and so all synchronisation must be done in the callbacks. But as long as the zbus channel is always thought of as a shared variable everything is fine. However when a new programmer starts and sees "channel" and "bus" they start using it in ways that make the system very unstable.

mkschreder commented 1 year ago

Maybe we can redo zbus so that it can be used as a robust ipc mechanism in zephyr?

rodrigopex commented 1 year ago

@mkschreder, I suggest you open an RFC proposing a change to ZBus. Or send a PR with the “proper” way to implement it. Thank you for your contribution.