lu-zero / libvpx

Local libvpx changes (POWER8 Altivec/VSX support)
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Speed up VSX convolution code #25

Open luctrudeau opened 6 years ago

luctrudeau commented 6 years ago

The vpx_convolve8_vsx function is the most time consuming function of libVPX on POWER. For POWER8, 24% of the runtime is spent in vpx_convolve8_vpx, while in POWER9 that value increases to 30%. Taking the time to optimize even more this function will have considerable impact on the libVPX encoding speed on POWER.

This is the optimal place to optimize libVPX on POWER in order to maximize results. Doubling the speed of vpx_convolve8_vsx will reduce encoding time by 10 to 15%.

This includes the following functions:

Testing:

lu-zero commented 6 years ago

The current code is vectorized, but it is making no assumption on the data dimension.

Specialized paths for just8x8, 16x16 and such should give the additional speedup we'd like to have since the code won't have to use vec_ste then.