Implement the loopfilter support

lu-zero commented 8 years ago

(TBD: prepare the list of functions)

sukrosono commented 7 years ago

start?

luctrudeau commented 6 years ago

Probably only worth it to implement vpx_lpf_horizontal_16_dual_c and vpx_lpf_vertical_16_dual_c

This is the % time the libvpx spends in these when encoding a 1080p video 0.35% vpx_lpf_horizontal_16_dual_c 0.35% vpx_lpf_vertical_16_dual_c 0.00% vpx_lpf_vertical_8_c 0.00% vpx_lpf_horizontal_8_c

lu-zero commented 6 years ago

Agreed, even if it is suspiciously low in the list.

shawnl commented 5 years ago

Just looking at the C code, I don't think vpx_lpf_horizontal_16_dual_c or vpx_lpf_vertical_16_dual_c will get any faster with vsx, as vsx lacks a vector-gather instruction. These loads are all over the place:

    const int8_t flat2 =
        flat_mask5(1, s[-8 * p], s[-7 * p], s[-6 * p], s[-5 * p], p0, q0,
                   s[4 * p], s[5 * p], s[6 * p], s[7 * p]);

    filter16(mask, *thresh, flat, flat2, s - 8 * p, s - 7 * p, s - 6 * p,
             s - 5 * p, s - 4 * p, s - 3 * p, s - 2 * p, s - 1 * p, s,
             s + 1 * p, s + 2 * p, s + 3 * p, s + 4 * p, s + 5 * p, s + 6 * p,
             s + 7 * p);

I ran into the same issue vectoring log/logf for glibc. https://sourceware.org/ml/libc-alpha/2019-05/msg00192.html

lu-zero / libvpx

Implement the loopfilter support #2