pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.98k stars 6.92k forks source link

GPU accelerated video loading with optimizations for reading at specific timestamps or time intervals #8599

Open skier233 opened 3 weeks ago

skier233 commented 3 weeks ago

🚀 The feature

Optimized video decoding and frame reading at regular time intervals or timestamp seeking for CPU and GPU accelerated video preprocessing.

Motivation, pitch

There are many usecases in ML where it may be beneficial to read in specific frames from a video rather than every frame. One such case is when running image classification models on a video and one frame at every x second interval in the video is far more efficient than processing every frame. Currently acquiring and resizing frames from a video at regular time intervals is incredibly slow with all CPU and GPU options offered by pytorch or torchvision. I've tried utilizing the NVDEC NVidia library as mentioned here and in other examples: https://pytorch.org/audio/main/tutorials/nvdec_tutorial.html This library is only optimized for consecutive frame reads though and is slower than using CPU with a more optimized open source library: https://github.com/dmlc/decord This library is significantly faster using CPU than either CPU or GPU options offered by pytorch, nvidia, or torchvision. However this library has not been updated in over 3 years and its GPU support has gotten lost to time. The community desperately needs options for efficient video decoding that use optimized patterns for timestamps or regular time interval frame reading and the only fast option available right now is a CPU only 3 year old library with no more support.

Alternatives

a no longer maintained old library that everyone is using that doesn't work with modern GPU's.

Additional context

No response

NicolasHug commented 3 weeks ago

Hi @skier233 , Thanks for opening this issue. We have recently started https://github.com/pytorch/torchcodec which is where we want to consolidate the video decoding capabilities of pytorch. The library is currently in Beta stage, so there might be rough edges still, but when it's more mature we'll start deprecating the video decoders in torchvision in favor of torchcodec. For now torchcodec only supports CPU decoding and Linux. GPU support and MacOS binaries should come soon. If you try it, we'd love to hear any feedback you may have!