Open yondonfu opened 4 years ago
For this task it is important to bear in mind that the currently used features basically rely on the extraction of the DCT and Gaussian transform of frames. Those are the main computational bottlenecks. A number of optimizations have been implemented:
The DCT implementation is that of OpenCV (https://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#dct), and the Gaussian used is that of skimage (https://scikit-image.org/docs/dev/api/skimage.filters.html#skimage.filters.gaussian)
I attach here a small benchmark of individual operations done through the feature extraction:
Some options to optimize could be to:
Use the magnitude of the DFT instead of the DCT, the computation seems 10% faster. From 1.53 ms to 1.34. This will imply to train the model with this new feature.
The use of cupy. Which is basically Numpy but accelarated by CUDA, this is viable if there are GPUs involved of course. They have implemented the FFT (fast implementation of the DFT) as well as gaussian filters.
Feature extraction is currently the most expensive step in verification (as noted here). We can investigate if there are any optimizations possible here i.e. algorithmic, hardware based [1].
[1] GPU acceleration should help with many of the calculations. The argument against GPU acceleration is why would someone with a GPU be outsourcing transcoding if they already have access to GPUs? This is a fair point. But, if it is the case that verification on a GPU requires less resources than transcoding on a GPU (either due to lower GPU utilization or due to the fact that verification can scale with a single GPU when transcoding with multiple GPUs) then this might still make sense as an option for users that do have access to a GPU.