holistic-video-understanding / HVU-Downloader

HVU Downloader tool
https://holistic-video-understanding.github.io
MIT License
17 stars 4 forks source link

Efficient video download and segmentation #2

Closed NikhilBartwal closed 4 years ago

NikhilBartwal commented 4 years ago

In regard to Issue : https://github.com/holistic-video-understanding/HVU-Dataset/issues/3

The Parallel method of the joblib library uses the 'Loky' backend by default which is a multi process system and therefore, does not allow, the various processes to access the common resource, which leads to the issue mentioned above. The solution was to pass the require = 'sharedmem' argument along with n_jobs in the parallel function. Using joblib.parallel for both downloading and video segmentation takes a lot of time and is, thus, inefficient.

After experimenting with different methods and libraries, the fastest and the most efficient way is to download all the videos in parallel using all the CPUs available (i.e num_jobs = -1) and then using the moviepy library to trim the videos using two parallel processes.

In addition to this, youtube-dl seems to have blocked the ipv6 IPs from making too many requests at a time, which has been resolved by passing the --force-ipv4 in the youtube-dl command.

The mentioned method gives a speed-up of more than 50% , which keeps on increasing as the number of videos to be processed increases.

bhack commented 4 years ago

i think that the parallel downloads are netbound processes and moviepy is CPU bound. If you can saturate all the CPUs It means that you are dowloading with enough parallelism with enough ffmpeg load for CPU.

NikhilBartwal commented 4 years ago

@bhack Yes and using 2 parallel processes after the downloads ensures quick and efficient segmentation of videos, which is not much affected by increasing the number of procces.

bhack commented 4 years ago

If the two videos are always available in the segmenting queue (without idle time) and you can saturate the CPUs (top) ok. But I suppose that 2 encoding processes are a little bit connected to the available (N) cores

NikhilBartwal commented 4 years ago

@bhack I just tried saturating the CPUs with njobs = -1 and it does, give a slight boost over the mentioned method. Do you think that would be better?

bhack commented 4 years ago

If you have 2 cores It Is the same :wink:

NikhilBartwal commented 4 years ago

@bhack Haha. Guess i was lucky i had 4. XD