Add benchmarks for task queues

Overview

We have two different task queue implementations:

ctl::ControlLoop (based on ctl::ControlTaskQueue)
pipeline::PipelineLoop

The first one is intended for low-priority tasks scheduled on dedicated control thread, and the second one is intended for pipeline manipulation tasks to be executed on soft-real-time pipeline processing thread.

See more details here: https://roc-streaming.org/toolkit/docs/internals/threads.html

Both queues implement lock-free task scheduling. ControlLoop is more feature-rich; it allows to schedule or re-schedule task to specific time in future, and to cancel tasks. PipelineLoop has less features, but has specialized scheduling algorithm that executes tasks in dedicated intervals between frame processing, to ensure that tasks don't affect real-time processing.

Algorithms are described in more detail in doxygen comments: 1, 2.

We have several benchmarks (1, 2, 3) for both queues, modeling various extreme cases: high contention, peak load, etc, and checking how the queues behave in these conditions.

What we're missing are benchmarks that measure the queue throughput and latency in the normal conditions, i.e. how much tasks can they process per second and what is the delay between scheduling and executing a task.

Task

Add benchmarks for ctl::ControlTaskQueue and pipeline::PipelineLoop that measure two parameters:

Task throughput, i.e. how much tasks per second can the queue process? In case of ctl::ControlTaskQueue, we should create separate benchmarks for schedule() and schedule_at(). In case of pipeline::PipelineLoop, we should create benchmarks for different frame sizes and processing times.
Task latency, i.e what is the typical delay between scheduling a task and actually processing a task. We need 95 percentile, i.e. what it the maximum delay for 95% of tasks. We already have percentile calculation in benchmarks, grep for "p95()".

In benchmarks for pipeline::PipelineLoop, in addition to the thread(s) that are actually scheduling tasks, we should run one thread that is reading frames, like we do it in FrameWriter in bench_pipeline_loop_peak_load.cpp. PipelineLoop goal is to schedule task execution between frames, so to simulate normal conditions, we should process some frames.

Actual frame processing should be simulated with a busy loop taking the given amount of time. (It's important not to use sleep here, because otherwise Linux scheduler will treat us as an I/O thread and our measurements will be incorrect, since real frame processing doesn't sleep).

Since both frame length and processing time affect task scheduling, we should create benchmarks for several combinations of this parameters, e.g.: small, medium, and large frames (1ms, 5ms, 20ms), and cheap and heavy processing (e.g. 5% and 80% of the frame playback time, e.g. 5% 1ms and 80% 1ms).

For information about running benchmarks, see developer cookbook.

roc-streaming / roc-toolkit

Add benchmarks for task queues #644

Overview

Task