pytorch / torchcodec

PyTorch video decoding
BSD 3-Clause "New" or "Revised" License
83 stars 9 forks source link

Add the ability to benchmark throughput using multiple threads #359

Closed ahmadsharif1 closed 1 week ago

ahmadsharif1 commented 2 weeks ago

The batch mode is a new mode that decodes a batch of 40 copies of the decoder using 8 threads.

Tested:

video=/home/ahmads/personal/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4, decoder=TorchCodecPublic
[---------------------------------------------------------------- video=/home/ahmads/personal/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps -----------------------------------------------------------------]
                        |  uniform 10 seek()+next()  |  batch uniform 10 seek()+next()  |  random 10 seek()+next()  |  batch random 10 seek()+next()  |  1 next()  |  batch 1 next()  |  10 next()  |  batch 10 next()  |  100 next()  |  batch 100 next()  |  create()+next()
1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      TorchCodecPublic  |            67.0            |              841.9               |            60.5           |              743.9              |    21.4    |      219.4       |     24.1    |       276.5       |     69.9     |       812.5        |                 
      TorchCodecCore    |                            |                                  |                           |                                 |            |                  |             |                   |              |                    |        18.5     

Times are in milliseconds (ms).
scotts commented 1 week ago

I assume the removal of the other decoders is temporary while getting everything working?

On the chart generated by generate_readme_*.py, I think we want to be selective on what we add to it. I think we want no more than four experiments per row. This is in contrast to the output from benchmark_decoders.py, where we can have many experiments. I see benchmark_decoders.py as a perf development tool, and generate_readme_*.py as our external showcase.