Experiment with optimizations for feature extraction

livepeer / verification-classifier

Metrics-based Verification Classifier

MIT License

8 stars 7 forks source link

Experiment with optimizations for feature extraction #108

Open yondonfu opened 4 years ago

yondonfu commented 4 years ago

Feature extraction is currently the most expensive step in verification (as noted here). We can investigate if there are any optimizations possible here i.e. algorithmic, hardware based [1].

[1] GPU acceleration should help with many of the calculations. The argument against GPU acceleration is why would someone with a GPU be outsourcing transcoding if they already have access to GPUs? This is a fair point. But, if it is the case that verification on a GPU requires less resources than transcoding on a GPU (either due to lower GPU utilization or due to the fact that verification can scale with a single GPU when transcoding with multiple GPUs) then this might still make sense as an option for users that do have access to a GPU.

ndujar commented 4 years ago

For this task it is important to bear in mind that the currently used features basically rely on the extraction of the DCT and Gaussian transform of frames. Those are the main computational bottlenecks. A number of optimizations have been implemented:

Frames are selected at random from a list of all frames in the segment and then compared. Not all frames are processed
The gaussian is pre-computed and then used for all three gauss based features (temporal_gaussian_mse-mean, temporal_gaussian_difference-mean, temporal_threshold_gaussian_difference-mean)

The DCT implementation is that of OpenCV (https://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#dct), and the Gaussian used is that of skimage (https://scikit-image.org/docs/dev/api/skimage.filters.html#skimage.filters.gaussian)

Sorkanius commented 4 years ago

I attach here a small benchmark of individual operations done through the feature extraction:

Some options to optimize could be to:

Use the magnitude of the DFT instead of the DCT, the computation seems 10% faster. From 1.53 ms to 1.34. This will imply to train the model with this new feature.
The use of cupy. Which is basically Numpy but accelarated by CUDA, this is viable if there are GPUs involved of course. They have implemented the FFT (fast implementation of the DFT) as well as gaussian filters.