Set ffmpeg thread count to 0 for single video benchmarks. This should saturate the system.
Set ffmpeg thread count to 1 for concurrent benchmarks. This should saturate the system because we have concurrency at the layer above the decoder.
Call concurrent benchmarks "concurrent" instead of "dataloader" as they don't technically use the pytorch dataloader.
Print the benchmark that's about to be run on the screen. This is only about 10 lines of output and makes it clear which benchmark takes a long time to finish.