More than 15% of the encoding time of libVPX on POWER is spent in the SADNxNx4D functions.
%
Function
10.63%
vpx_sad16x16x4d_vsx
3.60%
vpx_sad32x32x4d_vsx
3.22%
vpx_sad64x64x4d_vsx
1.12%
vpx_sad8x8x4d_c
Current VSX SAD implementations can be further optimized for considerable performance improvements. Doubling the speed of the SADNxNx4D functions would reduce encoding time by 5 to 8%.
This includes the following functions:
[ ] vpx_sad16x16x4d_vsx
[ ] vpx_sad32x32x4d_vsx
[ ] vpx_sad64x64x4d_vsx
[ ] vpx_sad8x8x4d_vsx
[ ] PROCESS16_4D
[ ] SAD8_4D
[ ] SAD16_4D
[ ] SAD32_4D
[ ] SAD64_4D
Testing:
[ ] Must pass the SADx4Test suite
[ ] Refactor SADx4Test to use the AbstractBench
[ ] Report performance in commit msg (compared to C version)
More than 15% of the encoding time of libVPX on POWER is spent in the SADNxNx4D functions.
Current VSX SAD implementations can be further optimized for considerable performance improvements. Doubling the speed of the SADNxNx4D functions would reduce encoding time by 5 to 8%.
This includes the following functions:
Testing: