lu-zero / libvpx

Local libvpx changes (POWER8 Altivec/VSX support)
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Speed Up SADNxNx4D #26

Open luctrudeau opened 6 years ago

luctrudeau commented 6 years ago

More than 15% of the encoding time of libVPX on POWER is spent in the SADNxNx4D functions.

%  Function
10.63% vpx_sad16x16x4d_vsx
3.60% vpx_sad32x32x4d_vsx
3.22% vpx_sad64x64x4d_vsx
1.12% vpx_sad8x8x4d_c

Current VSX SAD implementations can be further optimized for considerable performance improvements. Doubling the speed of the SADNxNx4D functions would reduce encoding time by 5 to 8%.

This includes the following functions:

Testing: