Open luctrudeau opened 6 years ago
The current code is vectorized, but it is making no assumption on the data dimension.
Specialized paths for just8x8
, 16x16
and such should give the additional speedup we'd like to have since the code won't have to use vec_ste
then.
The vpx_convolve8_vsx function is the most time consuming function of libVPX on POWER. For POWER8, 24% of the runtime is spent in vpx_convolve8_vpx, while in POWER9 that value increases to 30%. Taking the time to optimize even more this function will have considerable impact on the libVPX encoding speed on POWER.
This is the optimal place to optimize libVPX on POWER in order to maximize results. Doubling the speed of vpx_convolve8_vsx will reduce encoding time by 10 to 15%.
This includes the following functions:
Testing: