Closed luctrudeau closed 6 years ago
The speed tests for the SADTest suite have landed. https://chromium.googlesource.com/webm/libvpx/+/f950248b9b357b21e974e3ace94359d7ee8c7b29
The sad8x8 is currently in review
By changing how the absolute difference of sum is computed, we can stay in 8-bit lanes. This acceleration was also applied to all other sad block sizes and is currently in review. So expect speed ups for all block sizes.
VSX Version of SAD8xN is now upstream https://chromium.googlesource.com/webm/libvpx/+/e3ce12cfc1c2d2cc245e1a6d49eaf3ff18538547
Speed Ups when compared to C are as follows: 8x4 C time = 68.7 ms (±0.3 ms), VSX time = 31.8 ms (±0.1 ms) [2.2x] 8x8 C time = 55.6 ms (±0.3 ms), VSX time = 18.3 ms (±0.1 ms) [3.0x] 8x16 C time = 46.5 ms (±0.1 ms), VSX time = 15.6 ms (±0.1 ms) [3.0x]
The PROCESS16 macro now uses 8-bit lanes instead of 16-bit lanes. https://chromium.googlesource.com/webm/libvpx/+/f9dc411d89eed99d7def7de1e9dddba782c1212c
This results in Speed Ups for all other blocksizes, when compared to previous VSX code 16x8 Old VSX time = 16.7 ms, new VSX time = 9.1 ms [1.8x] 16x16 Old VSX time = 15.7 ms, new VSX time = 7.9 ms [2.0x] 16x32 Old VSX time = 14.4 ms, new VSX time = 7.2 ms [2.0x] 32x16 Old VSX time = 14.0 ms, new VSX time = 7.4 ms [1.9x] 32x32 Old VSX time = 13.4 ms, new VSX time = 6.5 ms [2.0x] 32x64 Old VSX time = 12.7 ms, new VSX time = 6.3 ms [2.0x] 64x32 Old VSX time = 12.6 ms, new VSX time = 6.3 ms [2.0x] 64x64 Old VSX time = 12.7 ms, new VSX time = 6.2 ms [2.0x]
Implement a VSX version of vpx_sad8x8
Each function must: