Closed TheShadow29 closed 5 years ago
Actually, we already do! The dataloaders for Kinetics, AVA, Something Something, all use videos directly. It would be easy to replace our ffmpeg-based loader in your code to use DALI, but that requires extra dependencies and building etc which is why we go with the lightweight ffmpeg-based approach. Charades and ActivityNet still use frames for historical reasons, and the fact that they have rather large videos, whereas networks only need a few consecutive frames, but it's easy to change.
See: https://github.com/gsig/PyVideoResearch/blob/c74321e2ffaf72dd959c3b63e5b122b7e2f1fb21/datasets/kinetics_mp4.py#L9 https://github.com/gsig/PyVideoResearch/blob/c74321e2ffaf72dd959c3b63e5b122b7e2f1fb21/datasets/utils.py#L38-L61
Best, Gunnar
This is great. I had referred to activity-net but didn't check Kinetics. Do you have any speed comparisons between ffmpeg vs dali? If not I could try to report back with a few tests.
Again, thanks for the great repository. It is extremely useful for starting out and building upon.
I haven't done speed comparisons with DALI, but I would expect DALI to be much faster, since our video loader is just piping the output of a ffmpeg subprocess to python. However, it has been fast enough for us (since the data loaders are in separate threads our models are typically only limited by gpu speed), and doesn't require any additional requirements/builds/etc, making it a nice and simple starting point. Since ffmpeg_video_loader(path) just returns the video as a numpy array, it can be easily replaced with DALI if needed.
Definitely report back what you find after a few tests!
Best, Gunnar
Hi. Thanks for this amazing repository.
I was wondering if you had any plans to load videos directly instead of first converting it to images. Having it in image format is very expensive from storage point of view. One way to directly load the video would be to use
DALI
(https://github.com/NVIDIA/DALI).