[Feature Request] Use Videos directly instead of converting to frames

gsig / PyVideoResearch

A repository of common methods, datasets, and tasks for video research

GNU General Public License v3.0

533 stars 90 forks source link

[Feature Request] Use Videos directly instead of converting to frames #16

Closed TheShadow29 closed 5 years ago

TheShadow29 commented 5 years ago

Hi. Thanks for this amazing repository.

I was wondering if you had any plans to load videos directly instead of first converting it to images. Having it in image format is very expensive from storage point of view. One way to directly load the video would be to use DALI (https://github.com/NVIDIA/DALI).

gsig commented 5 years ago

Actually, we already do! The dataloaders for Kinetics, AVA, Something Something, all use videos directly. It would be easy to replace our ffmpeg-based loader in your code to use DALI, but that requires extra dependencies and building etc which is why we go with the lightweight ffmpeg-based approach. Charades and ActivityNet still use frames for historical reasons, and the fact that they have rather large videos, whereas networks only need a few consecutive frames, but it's easy to change.

See: https://github.com/gsig/PyVideoResearch/blob/c74321e2ffaf72dd959c3b63e5b122b7e2f1fb21/datasets/kinetics_mp4.py#L9 https://github.com/gsig/PyVideoResearch/blob/c74321e2ffaf72dd959c3b63e5b122b7e2f1fb21/datasets/utils.py#L38-L61

Best, Gunnar

TheShadow29 commented 5 years ago

This is great. I had referred to activity-net but didn't check Kinetics. Do you have any speed comparisons between ffmpeg vs dali? If not I could try to report back with a few tests.

Again, thanks for the great repository. It is extremely useful for starting out and building upon.

gsig commented 5 years ago

I haven't done speed comparisons with DALI, but I would expect DALI to be much faster, since our video loader is just piping the output of a ffmpeg subprocess to python. However, it has been fast enough for us (since the data loaders are in separate threads our models are typically only limited by gpu speed), and doesn't require any additional requirements/builds/etc, making it a nice and simple starting point. Since ffmpeg_video_loader(path) just returns the video as a numpy array, it can be easily replaced with DALI if needed.

Definitely report back what you find after a few tests!

Best, Gunnar