Closed NikhilBartwal closed 4 years ago
i think that the parallel downloads are netbound processes and moviepy
is CPU bound. If you can saturate all the CPUs It means that you are dowloading with enough parallelism with enough ffmpeg load for CPU.
@bhack Yes and using 2 parallel processes after the downloads ensures quick and efficient segmentation of videos, which is not much affected by increasing the number of procces.
If the two videos are always available in the segmenting queue (without idle time) and you can saturate the CPUs (top
) ok.
But I suppose that 2 encoding processes are a little bit connected to the available (N) cores
@bhack I just tried saturating the CPUs with njobs = -1
and it does, give a slight boost over the mentioned method. Do you think that would be better?
If you have 2 cores It Is the same :wink:
@bhack Haha. Guess i was lucky i had 4. XD
In regard to Issue : https://github.com/holistic-video-understanding/HVU-Dataset/issues/3
The Parallel method of the joblib library uses the 'Loky' backend by default which is a multi process system and therefore, does not allow, the various processes to access the common resource, which leads to the issue mentioned above. The solution was to pass the
require = 'sharedmem'
argument along with n_jobs in the parallel function. Usingjoblib.parallel
for both downloading and video segmentation takes a lot of time and is, thus, inefficient.After experimenting with different methods and libraries, the fastest and the most efficient way is to download all the videos in parallel using all the CPUs available (i.e
num_jobs = -1
) and then using themoviepy
library to trim the videos using two parallel processes.In addition to this, youtube-dl seems to have blocked the ipv6 IPs from making too many requests at a time, which has been resolved by passing the
--force-ipv4
in theyoutube-dl
command.The mentioned method gives a speed-up of more than 50% , which keeps on increasing as the number of videos to be processed increases.