GPU accelerated video loading with optimizations for reading at specific timestamps or time intervals

🚀 The feature

Optimized video decoding and frame reading at regular time intervals or timestamp seeking for CPU and GPU accelerated video preprocessing.

Motivation, pitch

There are many usecases in ML where it may be beneficial to read in specific frames from a video rather than every frame. One such case is when running image classification models on a video and one frame at every x second interval in the video is far more efficient than processing every frame. Currently acquiring and resizing frames from a video at regular time intervals is incredibly slow with all CPU and GPU options offered by pytorch or torchvision. I've tried utilizing the NVDEC NVidia library as mentioned here and in other examples: https://pytorch.org/audio/main/tutorials/nvdec_tutorial.html This library is only optimized for consecutive frame reads though and is slower than using CPU with a more optimized open source library: https://github.com/dmlc/decord This library is significantly faster using CPU than either CPU or GPU options offered by pytorch, nvidia, or torchvision. However this library has not been updated in over 3 years and its GPU support has gotten lost to time. The community desperately needs options for efficient video decoding that use optimized patterns for timestamps or regular time interval frame reading and the only fast option available right now is a CPU only 3 year old library with no more support.

Alternatives

a no longer maintained old library that everyone is using that doesn't work with modern GPU's.

Additional context

No response

pytorch / vision