We have unit tests for ctl::ControlTaskQueue and pipeline::PipelineLoop, but unit tests can't detect all possible races. Since the implementation of the lock-free operations is tricky enough (especially in ctl::ControlTaskQueue), it's important to write good stress tests that are able to detect races, and periodically run them on supported architectures (at least x86_64, arm32, and arm64).
We need two stress tests: one for ctl::ControlTaskQueue and one pipeline::PipelineLoop. The first will be more complicated since ControlTaskQueue provides more operations with tasks compared to PipelineLoop.
A stress test should repeatedly schedule, re-schedule, cancel, and wait tasks from multiple threads, at random time, and with random task deadline. Task processing should also take random time.
The random delays should be selected in a way so that we periodically have both contended and uncontended cases. The test should have enough randomness to cover the following cases:
the task is in ready queue, sleeping queue, being processed, being finished, being waited, being cancelled, being re-scheduled from another thread
the event loop thread is working or is sleeping when a task operations is invoked
there are one or many concurrent operations with the queue, with the same or different tasks
the number of concurrent operations is sometimes smaller than the number of CPUs, and sometimes larger
the task has or doesn't have a completion handler (such tasks are handled a bit differently)
The test should ensure that the following invariants are always met:
any operation with the queue completes eventually (no hangs)
any scheduled and not cancelled task is processed eventually
for any scheduled or cancelled task, the completion handler is called eventually
any pending wait() completes eventually
if the task was not rescheduled, but only scheduled and probably cancelled, the handler is invoked exactly once
the same is true when scheduling the task again after waiting until it is fully finished; the handler should be invoked exactly one more time in this case
if the task was rescheduled while it was pending, the processing and handler are allowed to be called twice (one call for previous schedule if the deadline was expired, and one call for new schedule)
the task is processed and the handler is called not earlier than the task deadline expires
task state reported via pending(), success(), and cancelled() should correspond to the expected state
See #644 for introduction to tasks queues.
We have unit tests for ctl::ControlTaskQueue and pipeline::PipelineLoop, but unit tests can't detect all possible races. Since the implementation of the lock-free operations is tricky enough (especially in ctl::ControlTaskQueue), it's important to write good stress tests that are able to detect races, and periodically run them on supported architectures (at least x86_64, arm32, and arm64).
We need two stress tests: one for ctl::ControlTaskQueue and one pipeline::PipelineLoop. The first will be more complicated since ControlTaskQueue provides more operations with tasks compared to PipelineLoop.
A stress test should repeatedly schedule, re-schedule, cancel, and wait tasks from multiple threads, at random time, and with random task deadline. Task processing should also take random time.
The random delays should be selected in a way so that we periodically have both contended and uncontended cases. The test should have enough randomness to cover the following cases:
The test should ensure that the following invariants are always met: