Open lu-zero opened 8 years ago
start?
Probably only worth it to implement vpx_lpf_horizontal_16_dual_c and vpx_lpf_vertical_16_dual_c
This is the % time the libvpx spends in these when encoding a 1080p video 0.35% vpx_lpf_horizontal_16_dual_c 0.35% vpx_lpf_vertical_16_dual_c 0.00% vpx_lpf_vertical_8_c 0.00% vpx_lpf_horizontal_8_c
Agreed, even if it is suspiciously low in the list.
Just looking at the C code, I don't think vpx_lpf_horizontal_16_dual_c or vpx_lpf_vertical_16_dual_c will get any faster with vsx, as vsx lacks a vector-gather instruction. These loads are all over the place:
const int8_t flat2 =
flat_mask5(1, s[-8 * p], s[-7 * p], s[-6 * p], s[-5 * p], p0, q0,
s[4 * p], s[5 * p], s[6 * p], s[7 * p]);
filter16(mask, *thresh, flat, flat2, s - 8 * p, s - 7 * p, s - 6 * p,
s - 5 * p, s - 4 * p, s - 3 * p, s - 2 * p, s - 1 * p, s,
s + 1 * p, s + 2 * p, s + 3 * p, s + 4 * p, s + 5 * p, s + 6 * p,
s + 7 * p);
I ran into the same issue vectoring log
/logf
for glibc. https://sourceware.org/ml/libc-alpha/2019-05/msg00192.html
(TBD: prepare the list of functions)