New Feature: Constant Bandwidth Server (CBS)

It's with a great pleasure that I come before you in behalf of my research laboratory (SoftCPS - ISEP) to submit our first contribution to this amazing open-source RTOS: the Constant Bandwidth Server, or CBS.

Introduction

The CBS is a kernel feature meant to allow aperiodic, non-real-time tasks (i.e. jobs) to be executed within a periodic real-time taskset running under a dynamic sheduling environment such as EDF. It is called a server because its sole purpose is to serve (execute) jobs, effectively acting as a wrapper to them.

Problem description

A key feature in the analysis of hard real-time systems is to know whether or not a given set of tasks, each with their estimated execution times and periods, is guaranteed to run without missing any deadlines. This analysis is dependant on the scheduler algorithm and the aformentioned task properties.

However, on actual real-world systems it is quite often to find tasks which are event-driven instead of just time-driven. They don't have predictable activation instants or periods, and often also no hard deadlines, but must be served anyway among the periodic tasks. So how do we make sure that the system will not miss any critical task deadline when this level of incertainty is put on the table?

Well, there are several ways to do that, and the CBS is one of those. What it does is hold its own values of deadline and period so the aperiodic jobs given to it don't have to. It then stays inactive until a job arrives and, when it does, it enters the run queue for execution, updating its own deadline when certain conditions are met.

Proposed change

We propose to bring this feature to the Zephyr kernel. As far as we know, currently Zephyr already has a feature called workqueue which methods and goals are similar to the ones of CBS. However, workqueues seems to support only static priorities and there is no guarantee on its execution "boundaries" other than the configurable thread timeslices and the yield behavior when a new item is submitted. Moreover, the workqueues are typically assigned a single handler function (i.e. a single job) upon k_work_init and expected to only be submitted new arguments to execute within it by calling k_work_submit. The CBS, on the other hand, has been designed to accept different jobs and arguments altogether at each new submission with k_cbs_push_job.

Thus, the CBS would pose as the perfect complement to the Workqueue rather then its substitute - it is meant to work alongside the EDF scheduler instead of static priorities and support a more volatile set of job functions instead of defining a single function from the start and only change the arguments passed to it.

Detailed RFC

The CBS has been already implemented and extensively tested on various targets (including physical and emulated ARM, Xtensa, RISC-V) with no problems detected so far, but I'm doing a RFC before submitting a pull request 1) to follow the guidelines and b) to allow the community to inspect the work done so far and give tips on changes. The changes made to Zephyr can be found here, on branch cbs-as-subsystem.

Proposed change (Detailed)

The works conducted on this feature were additive; They don't modify or remove any existing functionality, therefore are unlikely to break any of the current features. The public API consists of just two elements, really: a K_CBS_DEFINE macro and a k_cbs_push_job function.

// this statically defines and initializes a Constant Bandwidth Server.
// a CBS can then be referenced in the code by the name given to it:
// const struct cbs_t <name>;
#define K_CBS_DEFINE(name, budget, period, static_priority)

// it should be noted that the static priority has nothing to do with the CBS itself.
// it is necessary because the EDF policy in Zephyr is just a tie-breaker between
// two or more ready tasks with the same static priority. Therefore a user should
// provide the same static priority to all the threads to run under EDF.

// this function pushes a job and a job argument to the CBS queue.
// the job will try to be pushed to a dedicated message queue with the
// specified timeout, and the return value is the same as returned by the
// message queue. 
int k_cbs_push_job(cbs_t *cbs, cbs_callback_t job_function, void *job_arg, k_timeout_t timeout);

The proposed implementation also offers a logging feature to allow users to keep track of the CBS events. When enabled, the CBS will use the logging API to register keypoints of its execution - such as a job being pushed to the queue, budget being replenished or the CBS thread entering the CPU to serve the jobs given to it. Logs will show up on screen like this:

  [00:00:12.028,000] <inf> CBS: cbs_1     J_PUSH  43543           // first job is pushed
  [00:00:12.028,000] <inf> CBS: cbs_1     B_COND  100000       // conditiom met, budget replenished
  [00:00:12.028,000] <inf> CBS: cbs_1     J_PUSH  100000         // one more job pushed
  [00:00:12.028,000] <inf> CBS: cbs_1     SWT_TO  100000       // CBS thread enters CPU to execute
  [00:00:12.031,000] <inf> CBS: cbs_1     J_COMP  68954         // first job completed
  [00:00:12.034,000] <inf> CBS: cbs_1     J_COMP  38914         // second job completed
  [00:00:12.034,000] <inf> CBS: cbs_1     SWT_AY  38914         // CBS thread leaves the CPU

If looking for more specific details, feel free to ask or inspect the aforementioned fork with the proposed implementation.

Dependencies

The CBS implementation was developed with the goal of using as many existing kernel features as possible in a way to minimize doing things from scratch. Therefore, each CBS is essentially composed of one thread, one message queue and one timer. The CBS structure holds a reference to those and a few extra metadata to be used internally:

typedef struct {
    struct k_timer *timer;
    struct k_msgq *queue;
    struct k_thread *thread;
    cbs_budget_t budget;
    cbs_cycle_t period;
    cbs_cycle_t abs_deadline;
    cbs_cycle_t start_cycle;
    cbs_cycle_t bandwidth;
    bool is_initialized;
    bool is_idle;
} cbs_t;

I don't know what else should I include here, so if anything is missing please let me know, but as I said before, nearly nothing of Zephyr's current implementation is excluded or modified with the proposed code. Therefore I don´t think anything is likely to break or showcase unexpected behavior. As of my rounds of testing, everything seems to be working fine.

Concerns and Unresolved Questions

However, there is one thing that should be kept in mind. To work as expected, the CBS needs to know when the thread enters/leaves the CPU for execution of the jobs given to it. Therefore it directly makes use of z_thread_mark_switched_in and z_thread_mark_switched_out kernel functions, which are apparently not working properly on all devices. Work is being done on this regard and more details can be seen on issue #76057.

Took a very quick look: FWIW this seems more like "lib" code than "kernel" code from a code organization perspective. It's built on existing kernel primitives (e.g. k_timer/msgq as I gather from above), not really a new primitive of its own. See for example the p4wq queue which speaks to a vaguely similar problem area and lives happily in ./lib/os/.

As far as the code itself, can you clarify exactly what the semantics are? I'm gathering that it's a work queue that implements EDF-like semantics, but it doesn't really spell out how: like, what happens when someone blows a deadline? What happens when the queue thread gets preempted (I mean, it probably fails, but how, and does/can it detect that situation)? Can you suspend the thread in a worker callback (surely not, but if you do what happens)?

Broadly I don't see any reason to reject code like this. Tools are tools and we already have lots of options, no harm in a few more. But it's hard to give a firm +1 without code; my vote is to submit and get the arguments started earlier rather than later.

Also: do look at lib/os/p4wq.c to see if that addresses some or all of the problems you're working with. It's possible that we can augment an existing tool rather than add a new one from scratch.

Hi, Andy! Thanks a lot for your comment.

Took a very quick look: FWIW this seems more like "lib" code than "kernel" code from a code organization perspective. It's built on existing kernel primitives (e.g. k_timer/msgq as I gather from above), not really a new primitive of its own. See for example the p4wq queue which speaks to a vaguely similar problem area and lives happily in ./lib/os/.

I agree that it looks more like a "lib" code, and that's why I implemented it as a subsystem rather than on the kernel files directly. I assumed, by inspecting the codebase, a subsystem would be the best place as my code needed similar "privileges" as the tracing code, of which you've seen me opening previous issues about (my code does not use the tracing API, though, just to clarify).

As far as the code itself, can you clarify exactly what the semantics are? I'm gathering that it's a work queue that implements EDF-like semantics, but it doesn't really spell out how: like, what happens when someone blows a deadline?

Yes, you can put it like that: a work queue with a deadline and a "period" of its own (quotes there because it is not really periodic, the period is actually used to calculate the new deadline when needed. I don't like this naming convention, tbh).

As for the semantics, the behavior of deadline misses is not usually defined as far as I know, being up to each particular application to decide what to do if/when that occurs. I don't think Zephyr has any protocol on this subject either; the CBS is only concerned about updating its deadline on certain events (e.g. jobs being pushed on an empty queue) and leaving the EDF scheduler to do its trick.

What happens when the queue thread gets preempted (I mean, it probably fails, but how, and does/can it detect that situation)? Can you suspend the thread in a worker callback (surely not, but if you do what happens)?

I'm not sure I understand your question, but a preemption can happen just as it would do with any thread, and critical sections of the CBS are protected with sched_locks. The CBS thread is not meant to be accessed directly, just as _current is not meant to be referenced in user code. That still can happen, but that's a bad practice and considered a problem of the user implementation. Thus, if a worker callback suspends the CBS thread or has an infinite loop within it, it would just stop working as intended (deadlines would still be updated correctly as their updates are associated with a timer, not the thread). In fact, the documentation written so far already gives some advice about it:

  [...]

  Jobs are regular C functions executed within a thread context that
  should **not** loop indefinitely (as it will prevent further jobs on
  the queue to ever be reached), return nothing and accept 
  one void pointer as the argument.

  [...]

  one can call :c:func:`k_cbs_push_job` from an ISR or a thread context,
  but ISRs should **not** attempt to wait trying to push a job on the CBS
  queue. Moreover, jobs are executed within a system thread context but
  should **not** attempt to directly interact with the CBS thread (e.g.
  suspend it or change its static priority on the fly), as it would likely
  lead to unexpected behavior.

I hope that answers your questions. I will take a look on the p4wq but would like to stick with the idea of a new tool for now.

Just bike shedding here, instead of calling it "server" as that gives to me wrong kind of connotation, could it be called a "service" instead?

Just bike shedding here, instead of calling it "server" as that gives to me wrong kind of connotation, could it be called a "service" instead?

I agree the naming convention is somewhat confusing. However, I don't think I should change anything because that's how it was originally proposed by the algorithm's author and how it's been implemented on other operating systems such as RTEMS and Linux.

how it's been implemented on other operating systems such as RTEMS and Linux.

Fair enough, if the naming is already used elsewhere we should stick with the same name.

Swinging back by: FWIW, what I'm really asking for is a clean description of what the algorithm actually does, what promises it makes, and what demands it makes on the application. Like a bullet list of statements like:

After the work item is added, it will run when $CONDITION1 until $CONDITION2
The deadline is specified in units of $UNIT and measured by $MECHANISM
The CBS mechanisms guarantees $SOME_GOOD_BEHAVIOR_APPS_WANT, which the core schedules does not, because $REASON
The work item $WILL_or_WILL_NOT be preempted by the CBS code if it exceeds its deadline
If a work item runs too long $FAILURE1 will occur
The CBS server requires that it not be preempted by excessive ISR or high priority thread usage, if it is, then $FAILURE2 will occur

Basically most people don't know what this is, don't want to read a paper describing it, and just want an executive summary of why they should use this.

Swinging back by: FWIW, what I'm really asking for is a clean description of what the algorithm actually does, what promises it makes, and what demands it makes on the application. [...] Basically most people don't know what this is, don't want to read a paper describing it, and just want an executive summary of why they should use this.

Fair enough, sir. I myself am also resistant to the idea of having to read whole papers to grab a new concept. Let me try to explain it the easiest way I can, then:

Regular EDF tasks have a period in which they regularly activate and execute. At every new period, they invoke k_thread_deadline_set, do their job and sleep until the next period comes. Then repeat. and repeat. and repeat.
```
void edf_thread(){
    while(true){
        k_thread_deadline_set(this_thread, deadline);
        do_my_work();
        sleep_until(next_period);
    }
}
```
The CBS is essentially the same thing, with two major differences: 1) there is no pre-defined work it must do, as all the work is assigned to it in runtime, and 2) the call to k_thread_deadline_set is triggered by specific events, NOT by the next period.
```
void cbs_thread(){
    while(true){
        wait_for_new_job();               // CBS remains inactive while it has no jobs to execute
        k_timer_start();
        do_the_job();
        k_timer_stop();
    }
}
```

The CBS only calls k_thread_deadline_set in two moments: 1) when a job is pushed to it and 2) when the timer expires.

void k_cbs_push_job(void *job, cbs_t *cbs){
    ...
    if(condition_is_met(cbs)){                          // when a job is pushed, a condition will be checked
         deadline = now + period;                    // if condition holds, deadline is updated
         k_thread_deadline_set(cbs, deadline);
    }
}

void cbs_timer_expiry_function(){
    ...
    deadline += period;
    k_thread_deadline_set(cbs, deadline);             // the deadline is always updated if timer expires
}

The timer of cbs_thread therefore is a protective mechanism to keep updating the CBS deadline when a job takes too long to finish. The succesively postponed deadlines mean the CBS will progressively lower its priority for the EDF scheduler, which guarantees that other tasks within the EDF will eventually be able to take the CPU back for them.
The timer duration is known as the CBS budget.
The value used to calculate the CBS deadline is known as the CBS period. This is admittedly confusing, since the CBS is not periodic (it sleeps when there are no jobs to execute). I'd rather call it relative deadline instead.

So, in a nutshell:

The CBS is a wrapper for jobs that have unknown execution times and no deadlines, but still need to be executed within an EDF task set. Since EDF needs all tasks to have a deadline, the CBS does the favor of having one for these jobs.

Why this matters

Because the CBS guarantees that no hard real-time thread will ever miss a deadline just because some random function took too long to execute, at the same time as it guarantees that these random functions will have as much execution time as they can get.

After the work item is added, it will run when $CONDITION1 until $CONDITION2

The deadline is specified in units of $UNIT and measured by $MECHANISM

The CBS mechanisms guarantees $SOME_GOOD_BEHAVIOR_APPS_WANT, which the core schedules does not, because $REASON

The work item $WILL_or_WILL_NOT be preempted by the CBS code if it exceeds its deadline

If a work item runs too long $FAILURE1 will occur

The CBS server requires that it not be preempted by excessive ISR or high priority thread usage, if it is, then $FAILURE2 will occur

As for your requested format, here it goes:

After the work item is added, it will run when its deadline is the earliest of all. It will keep running until the work item is completed.
The deadline is specified in CPU cycles, just like any other deadline is this RTOS. It is updated whenever a new work item is pushed or when the timer associated with the CBS expires.
The CBS mechanisms guarantees that no hard real-time thread will ever miss a deadline just because some random function took too long to execute, at the same time as it guarantees that these random functions will have as much execution time as they can get, which the core schedules does not, because it doesn't have any current mechanism of ensuring that all tasks will be executed within their deadlines.
The work item will keep executing its code, albeit preemptible at any time, if it exceeds its deadline, just like any other thread in Zephyr's EDF. There is no punishment for trespassers because the whole point is to ensure it won't trespass.
If a work item runs too long the timer will expire and trigger a deadline update. If, even with the new (postponed) deadline, the CBS keeps having the earliest value overall, it will resume its execution normally. If other thread has earlier deadline, however, it will be preempted normally.
The CBS server has no requirements of protection against excessive ISR or high priority thread usage. All critical sections within the CBS are protected with k_sched_lock and k_sched_unlock. It is up to the application developer to design a system with no deadlocks or excessive ISRs, as usual.

Hi, any developments here? Should I explain anything else? Is it good to be submitted as a PR?

zephyrproject-rtos / zephyr