Open tmatth opened 5 years ago
@tterribe suggested as a preliminary step, simply varying the weights depending on quantizer. It may be worth it to dump whatever libaom's using as parameters for different QP.
From running this script on subset1
tmatth@hydra ~ $ cat hack_metrics.sh
#!/bin/sh
set -e
set -u
SEQ=${1:-/mnt/raid/Videos/subset1-y4m/Air_Force_Academy_Chapel,_Colorado_Springs,_CO_04090u_original.y4m}
BASENAME_SEQ=$(basename ${SEQ})
AOMDIR=aom-master/aom_build
OUTDIR=~/out
mkdir -p ~/out
cd ${AOMDIR}
for x in 20 32 43 55 63; do
echo $x
OUTPUT=${BASENAME_SEQ}.$x.ivf
./aomenc ${SEQ} --ivf --tile-rows=2 --tile-columns=2 --passes=1 --quiet --rt --cpu-used=8 --end-usage=q --cq-level=$x -o ${OUTDIR}/${OUTPUT}
done
I see that aomenc uses these CDEF parameters per QP:
QP=20 strengths[0]:0 uv_strengths[0]:0 q:22 pri_damping:3 sec_damping:3
QP=32 strengths[0]:0 uv_strengths[0]:4 q:44 pri_damping:3 sec_damping:3
QP=43 strengths[0]:4 uv_strengths[0]:4 q:95 pri_damping:4 sec_damping:4
QP=55 strengths[0]:9 uv_strengths[0]:8 q:235 pri_damping:5 sec_damping:5
QP=63 strengths[0]:22 uv_strengths[0]:13 q:465 pri_damping:5 sec_damping:5
@tterribe suggests making cdef parameters depend on quantizer in terms of log_target_q
instead of ac_q(fi.base_q_idx, 0, bd) as i32;)
, e.g.:
let quantizer = bexp64(log_target_q + scale);
rate.rs: let quantizer_u = bexp64(log_target_q + offset_u + scale);
rate.rs: let quantizer_v = bexp64(log_target_q + offset_v + scale);
@tterribe suggests that for inter frames, we may want to search (as master is doing) but between 2 choices: strength dependent on qp vs. disabling CDEF entirely.
I also want to compare only forcing strength based on QP for keyframes (and leaving inter frames with the existing search) to see if inter frames are where objective fast is regressing.
I also want to compare only forcing strength based on QP for keyframes (and leaving inter frames with the existing search) to see if inter frames are where objective fast is regressing.
CDEF strength from QP for intra frames only gives a -0.39% improvement across metrics, but no real encoder speed improvement on objective-fast-1: https://beta.arewecompressedyet.com/?job=master-d3992e510b9c4e67ad99f8ceaa59943dc34534f7&job=pick-cdef-from-q-intra-only%402019-06-27T03%3A09%3A08.928Z
Compared to always selecting CDEF strength from QP (for inter and intra): https://beta.arewecompressedyet.com/?job=pick-cdef-from-q-intra-only%402019-06-27T03%3A09%3A08.928Z&job=pick-cdef-from-q-always%402019-06-27T03%3A09%3A40.343Z Here the speed savings are significant at low QP.
So commit 5625ee37c0d95f1887c20ff2e492e89653a6072d is pretty restrictive in terms of CDEF search (effectively disabling it), I think the next step would be to put the CDEF from QP mode behind a speed setting for low QP.
Basically this is at implementing this TODO: https://github.com/xiph/rav1e/blob/dc8bb6332f491191f988cf0f46468927c0bb896a/src/encoder.rs#L946
@xiphmont I know this is going back a ways, but do you recall why these strengths are multiplied by 4? https://github.com/xiph/rav1e/blob/e9be6c95ec6b1b9fced8a5ab514709778c771c43/src/encoder.rs#L698
Some notes based on research from Blue and me in aomenc:
Pick from Q is pretty effective for the most part. I think when implementing the full CDEF search, it would be smart to define the search range based on the Q. i.e. for CDEF search, if pick from Q would give a strength of 2, the search could test 1, 2, 3.
aomenc instead defines a constant subset of strengths to search depending on speed level. In some cases this is worse than pick from Q because at one level, it only searches strengths 0 and 11, so CDEF is either off or full strength. This leads to some quality inconsistencies at speeds 5 and 6.
The other item of note is that Wiener LR seems to introduce more blur than SGR, so it may make sense to disable Wiener filters at lower Qs.
Ideally, for CDEF, we should be using the full 0-15 strengths available on all speeds below speed 6, but prune instead the available strengths based on quantizer.
Higher quantizer = less pruning. Lower quantitizer = more pruning. Low quantizer = CDEF Pick from Q. Higher speed = pruning happens faster.
Paraphrasing @tterribe:
For a given frame, we can have up to 8 sets of CDEF parameters for superblocks to choose from (note: it is better to have fewer than 8 for lower bitrates, to reduce the cost of coding them per SB). Currently these are hard coded.
We could do a feedforward approach, where for the last frame of a given type, you greedily search for better CDEF parameters, then select those for the next frame of the same type (inter/intra, pyramid level, etc.?).
We could also look at stats and try and find decent values to hardcode for lower bitrates.
Once we can buffer a frame in rav1e, that would allow to do something closer to what libaom does, where we do multiple passes of one frame and greedily search for better sets, i.e. swapping in parameters that perform better.