roc-streaming / roc-toolkit

Real-time audio streaming over the network.
https://roc-streaming.org
Mozilla Public License 2.0
1.02k stars 203 forks source link

Add benchmarks for task queues #644

Open gavv opened 7 months ago

gavv commented 7 months ago

Overview

We have two different task queue implementations:

The first one is intended for low-priority tasks scheduled on dedicated control thread, and the second one is intended for pipeline manipulation tasks to be executed on soft-real-time pipeline processing thread.

See more details here: https://roc-streaming.org/toolkit/docs/internals/threads.html

Both queues implement lock-free task scheduling. ControlLoop is more feature-rich; it allows to schedule or re-schedule task to specific time in future, and to cancel tasks. PipelineLoop has less features, but has specialized scheduling algorithm that executes tasks in dedicated intervals between frame processing, to ensure that tasks don't affect real-time processing.

Algorithms are described in more detail in doxygen comments: 1, 2.

We have several benchmarks (1, 2, 3) for both queues, modeling various extreme cases: high contention, peak load, etc, and checking how the queues behave in these conditions.

What we're missing are benchmarks that measure the queue throughput and latency in the normal conditions, i.e. how much tasks can they process per second and what is the delay between scheduling and executing a task.

Task

Add benchmarks for ctl::ControlTaskQueue and pipeline::PipelineLoop that measure two parameters:

In benchmarks for pipeline::PipelineLoop, in addition to the thread(s) that are actually scheduling tasks, we should run one thread that is reading frames, like we do it in FrameWriter in bench_pipeline_loop_peak_load.cpp. PipelineLoop goal is to schedule task execution between frames, so to simulate normal conditions, we should process some frames.

Actual frame processing should be simulated with a busy loop taking the given amount of time. (It's important not to use sleep here, because otherwise Linux scheduler will treat us as an I/O thread and our measurements will be incorrect, since real frame processing doesn't sleep).

Since both frame length and processing time affect task scheduling, we should create benchmarks for several combinations of this parameters, e.g.: small, medium, and large frames (1ms, 5ms, 20ms), and cheap and heavy processing (e.g. 5% and 80% of the frame playback time, e.g. 5% 1ms and 80% 1ms).

For information about running benchmarks, see developer cookbook.