xiph / rav1e

The fastest and safest AV1 encoder.
BSD 2-Clause "Simplified" License
3.71k stars 252 forks source link

Improve CDEF parameter selection #845

Open tmatth opened 5 years ago

tmatth commented 5 years ago

Paraphrasing @tterribe:

For a given frame, we can have up to 8 sets of CDEF parameters for superblocks to choose from (note: it is better to have fewer than 8 for lower bitrates, to reduce the cost of coding them per SB). Currently these are hard coded.

We could do a feedforward approach, where for the last frame of a given type, you greedily search for better CDEF parameters, then select those for the next frame of the same type (inter/intra, pyramid level, etc.?).

We could also look at stats and try and find decent values to hardcode for lower bitrates.

Once we can buffer a frame in rav1e, that would allow to do something closer to what libaom does, where we do multiple passes of one frame and greedily search for better sets, i.e. swapping in parameters that perform better.

tmatth commented 5 years ago

@tterribe suggested as a preliminary step, simply varying the weights depending on quantizer. It may be worth it to dump whatever libaom's using as parameters for different QP.

tmatth commented 5 years ago

via @KyleSiefring: https://aomedia.googlesource.com/aom/+/805c7d26fca822b33e22ee489b6de7ad287b6086

tmatth commented 5 years ago

From running this script on subset1

tmatth@hydra ~ $ cat hack_metrics.sh 
#!/bin/sh

set -e
set -u

SEQ=${1:-/mnt/raid/Videos/subset1-y4m/Air_Force_Academy_Chapel,_Colorado_Springs,_CO_04090u_original.y4m}
BASENAME_SEQ=$(basename ${SEQ})

AOMDIR=aom-master/aom_build
OUTDIR=~/out
mkdir -p ~/out

cd ${AOMDIR}

for x in  20 32 43 55 63; do
    echo $x
    OUTPUT=${BASENAME_SEQ}.$x.ivf
    ./aomenc ${SEQ} --ivf --tile-rows=2 --tile-columns=2 --passes=1 --quiet --rt --cpu-used=8 --end-usage=q --cq-level=$x -o ${OUTDIR}/${OUTPUT}
done

I see that aomenc uses these CDEF parameters per QP:

QP=20 strengths[0]:0 uv_strengths[0]:0 q:22 pri_damping:3 sec_damping:3
QP=32 strengths[0]:0 uv_strengths[0]:4 q:44 pri_damping:3 sec_damping:3
QP=43 strengths[0]:4 uv_strengths[0]:4 q:95 pri_damping:4 sec_damping:4
QP=55 strengths[0]:9 uv_strengths[0]:8 q:235 pri_damping:5 sec_damping:5
QP=63 strengths[0]:22 uv_strengths[0]:13 q:465 pri_damping:5 sec_damping:5
tmatth commented 5 years ago

@tterribe suggests making cdef parameters depend on quantizer in terms of log_target_q instead of ac_q(fi.base_q_idx, 0, bd) as i32;), e.g.:

let quantizer = bexp64(log_target_q + scale);
rate.rs:    let quantizer_u = bexp64(log_target_q + offset_u + scale);
rate.rs:    let quantizer_v = bexp64(log_target_q + offset_v + scale);
tmatth commented 5 years ago

@tterribe suggests that for inter frames, we may want to search (as master is doing) but between 2 choices: strength dependent on qp vs. disabling CDEF entirely.

I also want to compare only forcing strength based on QP for keyframes (and leaving inter frames with the existing search) to see if inter frames are where objective fast is regressing.

tmatth commented 5 years ago

I also want to compare only forcing strength based on QP for keyframes (and leaving inter frames with the existing search) to see if inter frames are where objective fast is regressing.

CDEF strength from QP for intra frames only gives a -0.39% improvement across metrics, but no real encoder speed improvement on objective-fast-1: https://beta.arewecompressedyet.com/?job=master-d3992e510b9c4e67ad99f8ceaa59943dc34534f7&job=pick-cdef-from-q-intra-only%402019-06-27T03%3A09%3A08.928Z

Compared to always selecting CDEF strength from QP (for inter and intra): https://beta.arewecompressedyet.com/?job=pick-cdef-from-q-intra-only%402019-06-27T03%3A09%3A08.928Z&job=pick-cdef-from-q-always%402019-06-27T03%3A09%3A40.343Z Here the speed savings are significant at low QP.

tmatth commented 5 years ago

So commit 5625ee37c0d95f1887c20ff2e492e89653a6072d is pretty restrictive in terms of CDEF search (effectively disabling it), I think the next step would be to put the CDEF from QP mode behind a speed setting for low QP.

tmatth commented 4 years ago

Basically this is at implementing this TODO: https://github.com/xiph/rav1e/blob/dc8bb6332f491191f988cf0f46468927c0bb896a/src/encoder.rs#L946

tmatth commented 3 years ago

@xiphmont I know this is going back a ways, but do you recall why these strengths are multiplied by 4? https://github.com/xiph/rav1e/blob/e9be6c95ec6b1b9fced8a5ab514709778c771c43/src/encoder.rs#L698

shssoichiro commented 2 years ago

Some notes based on research from Blue and me in aomenc:

Pick from Q is pretty effective for the most part. I think when implementing the full CDEF search, it would be smart to define the search range based on the Q. i.e. for CDEF search, if pick from Q would give a strength of 2, the search could test 1, 2, 3.

aomenc instead defines a constant subset of strengths to search depending on speed level. In some cases this is worse than pick from Q because at one level, it only searches strengths 0 and 11, so CDEF is either off or full strength. This leads to some quality inconsistencies at speeds 5 and 6.

The other item of note is that Wiener LR seems to introduce more blur than SGR, so it may make sense to disable Wiener filters at lower Qs.

BlueSwordM commented 2 years ago

Ideally, for CDEF, we should be using the full 0-15 strengths available on all speeds below speed 6, but prune instead the available strengths based on quantizer.

Higher quantizer = less pruning. Lower quantitizer = more pruning. Low quantizer = CDEF Pick from Q. Higher speed = pruning happens faster.