blocks placed at incorrect position or time

Describe the bug When I am encoding https://www.breunig.xyz/share/2024-04-02/raw9.yuv (150 MB), at around the 12 second mark rav1e produces distorted blocks like so: mpv-shot0001 When watching the encoded video, it seems as if the video jumps backwards/forwards by a couple of frames, creating a jarring motion. Put differently, this doesn't appear to be just low bitrate "needs more jpeg" effect. I have uploaded a broken example encode here: https://www.breunig.xyz/share/2024-04-02/70.ivf

From my testing, it appears that rav1e runs into rate control issues at the affected frames. Changing parameters slightly makes the "jarring motion bug" go away, but the "extremely low bitrate" issue stays. For example, if I set --skip 200 the motion looks fine, but it retains the short section of more-than-expected jpeg artifacts: https://www.breunig.xyz/share/2024-04-02/72.ivf

FWIW, this is not the only video I have observed the issue with, but I haven't done the work to trim down the reproducer there. However, the general video content is the same: coming from a dark area that is relatively static (i.e. the elevator moving up) to a bright area (i.e. outside) with lots of motion ("zoom-in" as I go forward + left/right pans due to making turns).

I can reproduce the issue on rav1e 0.6.6 (the version shipped on Ubuntu), 0.7.1 shipped in Debian testing and the most recent master commit as of writing.
This happens on different resolutions, too, i.e. I can generally reproduce this using 1920x1080 or 1280x720

I haven't tested all combinations, but see below for the ones I have with a reasonably short input video.

To Reproduce

testing with Debian shipped rav1e:
      rav1e 0.7.1 (UNKNOWN) (release)
      unknown rustc version unknown target
      Compiled CPU Features: fxsr,sse,sse2
      Runtime Assembly Support: Enabled
      Runtime Assembly Level: AVX2
      Threading: Enabled
      Unstable Features: Disabled
      Compiler Flags: -Cdebuginfo=2--cap-lintswarn-Clinker=x86_64-linux-gnu-gcc-Clink-arg=-Wl,-z,relro--remap-path-prefix/build/reproducible-path/rust-rav1e-0.7.1=/usr/share/cargo/registry/rav1e-0.7.1--remap-path-prefix/build/reproducible-path/rust-rav1e-0.7.1/debian/cargo_registry=/usr/share/cargo/registry

reproducible? (yes = very obvious, kinda = less so, no = looks normal apart from rate control issue)
↓     command used
      ↓
YES   rav1e raw9.yuv --speed  9 --bitrate 1000 -o 46.ivf

YES   rav1e raw9.yuv --speed 10 --bitrate 500  -o 48.ivf
YES   rav1e raw9.yuv --speed  9 --bitrate 500  -o 47.ivf
YES   rav1e raw9.yuv --speed  8 --bitrate 500  -o 49.ivf
YES   rav1e raw9.yuv --speed  7 --bitrate 500  -o 50.ivf
YES   rav1e raw9.yuv --speed  6 --bitrate 500  -o 51.ivf
YES   rav1e raw9.yuv --speed  5 --bitrate 500  -o 52.ivf
YES   rav1e raw9.yuv --speed  4 --bitrate 500  -o 53.ivf
YES   rav1e raw9.yuv --speed  3 --bitrate 500  -o 54.ivf
YES   rav1e raw9.yuv --speed  2 --bitrate 500  -o 55.ivf
YES   rav1e raw9.yuv --speed  1 --bitrate 500  -o 56.ivf

YES   RAV1E_CPU_TARGET=rust   rav1e raw9.yuv --speed 10 --bitrate 500  -o 57.ivf
YES   RAV1E_CPU_TARGET=sse2   rav1e raw9.yuv --speed 10 --bitrate 500  -o 58.ivf
YES   RAV1E_CPU_TARGET=ssse3  rav1e raw9.yuv --speed 10 --bitrate 500  -o 59.ivf
YES   RAV1E_CPU_TARGET=SSE4_1 rav1e raw9.yuv --speed 10 --bitrate 500  -o 60.ivf
YES   RAV1E_CPU_TARGET=AVX2   rav1e raw9.yuv --speed 10 --bitrate 500  -o 61.ivf

YES   rav1e raw9.yuv --speed 10 --bitrate 500  -o 64.ivf --skip 50
KINDA rav1e raw9.yuv --speed 10 --bitrate 500  -o 62.ivf --skip 100
NO    rav1e raw9.yuv --speed 10 --bitrate 500  -o 63.ivf --skip 200

NO    rav1e raw9.yuv --speed 10 --bitrate 250  -o 65.ivf --skip 200
NO    rav1e raw9.yuv --speed 10 --bitrate 250  -o 66.ivf --skip 150

NO    rav1e raw9.yuv --speed 10 --bitrate 100  -o 67.ivf --skip 150
YES   rav1e raw9.yuv --speed 10 --bitrate 100  -o 68.ivf --skip 100

testing with rav1e compiled from source using
  RUSTFLAGS="-C target-cpu=native" cargo build --release
at commit f3dd0499f2e6c246d531ab1aab46c1e0bb30b4da:
  rav1e 0.7.0 (p20240319) (release)
  rustc 1.70.0 x86_64-unknown-linux-gnu
  Compiled CPU Features: adx,aes,avx,avx2,bmi1,bmi2,cmpxchg16b,f16c,fma,fxsr,lzcnt,movbe,pclmulqdq,popcnt,rdrand,rdseed,sse,sse2,sse3,sse4.1,sse4.2,ssse3,xsave,xsavec,xsaveopt,xsaves
  Runtime Assembly Support: Enabled
  Runtime Assembly Level: AVX2
  Threading: Enabled
  Unstable Features: Disabled
  Compiler Flags: -C target-cpu=native

YES   ./target/release/rav1e raw9.yuv --speed 10 --bitrate 500  -o 69.ivf
YES   ./target/release/rav1e raw9.yuv --speed 10 --bitrate 500  -o 70.ivf --skip 50
KINDA ./target/release/rav1e raw9.yuv --speed 10 --bitrate 500  -o 71.ivf --skip 100
NO    ./target/release/rav1e raw9.yuv --speed 10 --bitrate 500  -o 72.ivf --skip 200

Expected behavior

not have "jarring motion bug" (important)
maybe not have "extremely low bitrate section" (in my "production" setup I actually don't notice the "low bitrate" effect, but I am also not using 500 or 1000 kbps there. So it seems like a related issue or maybe a trigger, but I believe it's distinct from the jarring motion bug)

Required Information For the custom built version:

$ cargo --version
cargo 1.69.1
$ rustc --version
rustc 1.70.0
$ nasm --version # if on x86_64
NASM version 2.16.01

Operating system (for the detailed tests shown above):

$ uname -a
Linux 6.6.15-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.15-2 (2024-02-04) x86_64 GNU/Linux

From IRC:

# is the issue present with higher bitrate?
NO    rav1e raw9.yuv --speed 10 --bitrate 5000  -o 73.ivf --skip 50

# is the issue present when using multi-pass?
NO    rav1e raw9.yuv --speed 10 --bitrate 500  -o 74_first.ivf --skip 50 --first-pass 74.stats
NO    rav1e raw9.yuv --speed 10 --bitrate 500  -o 74_second.ivf --skip 50 --second-pass 74.stats

NO    rav1e raw9.yuv --speed 10 --bitrate 500  -o 75_first.ivf --first-pass 75.stats

I have also failed to find a commit when this bug was introduced. Either it's been there "mostly from the beginning", or it's in some dependency that I fail to recompile/pull the old version for. I gave up trying to make versions older than 8b19a94ee835571e559869d4f63e7e723adce0a2 build, but as far as I can tell the bug is already present there. Note that I had to

cargo update -p regex@1.10.4 --precise 1.3.9
cargo update -p backtrace@0.3.71 --precise 0.3.49

at some point to keep the build working in a debian:stable docker container due to the Cargo.lock not being present for some period, I think. So low confidence that I didn't make a bisecting mistake.

Can you reproduce the bug with the quantizer mode instead of bitrate?

Can you reproduce the bug with the quantizer mode instead of bitrate?

rav1e 0.7.1 from Debian

X = chosen quantizer level

rav1e raw9.yuv --speed 10 --bitrate 500 --skip 50 --quantizer X -o qX.ivf
↓             rav1e raw9.yuv --speed 10 --skip 50 --quantizer X -o pX.ivf
              ↓

NO q1         NO p1
NO q11        NO p11
NO q21        NO p21
NO q31        NO p31
NO q41        NO p41
NO q51        NO p51
NO q61        NO p61
NO q71        NO p71
NO q81        NO p81 
NO q91        NO p91 
NO q101       NO p101
NO q111       NO p111
NO q121       NO p121
NO q131       NO p131
NO q141       NO p141
NO q151       NO p151
NO q161       NO p161
NO q171       NO p171
NO q181       NO p181
NO q191       NO p191
NO q201       NO p201
NO q211       NO p211
NO q221       NO p221
NO q231       NO p231
NO q241       NO p241
NO q251       NO p251
NO q252       NO p252
NO q253       NO p253
NO q254       NO p254

YES q255      NO p255

issue 1: close key frame insertion and reservoir-frame-delay

Playing around a bit, I suspect this is a rate control issue that occurs when rav1e inserts a key frame early (before max keyframe interval) and reservoir-frame-delay <= keyint. Since by default keyint=240 and reservoir-frame-delay=min(240, 1.5*keyint)=240 latter is true unless rav1e's defaults are changed.

Let's look at some graphs. Here's with rav1e's default settings (i.e. -i 12 -I 240 --reservoir-frame-delay 240):

These metrics have been grabbed from https://github.com/xiph/rav1e/blob/c7c72b5530e391211c5d5f32b16394d1c7dc00cc/src/rate.rs#L890-L897 and https://github.com/xiph/rav1e/blob/c7c72b5530e391211c5d5f32b16394d1c7dc00cc/src/rate.rs#L1227-L1228 . The x-axis are whenever the code-path in question got called, so roughly matches video progress. y-axis are either bits for the top two graphs, or trans form units for the bottom one.

We can see that rate_total ("rate_total is the total bits available over the next reservoir_tus TUs") becomes negative at some point. This matches the period of distorted video. We can also see that shortly before the real_bits (just bits in rav1e's code) are much larger than the estimate – this is where the "calm period" in my video ends, and the "action" starts. The reservoir empties accordingly. However, as far as I can tell that is not the (only?/main?) reason for the rate_total to become negative, since fiddling with the rate_bias (not graphed) calculation keeps the reservoir full enough, but the rate_total will still be negative.

The interesting bit seems to be the reservoir_tus which changes abruptly, which makes the rate_total negative. Basically, as far as rate control is concerned there are no more bits to spend, and none are coming until the next key frame. So it starves the last frames before the next key frame.

The dip in reservoir_tus is triggered by insertion of a new key frame shorter than the max key frame interval. In my case that's some 30 frames or so ahead. But that doesn't matter much, since guess_frame_subtypes will only consider up to the last keyframe within reservoir-frame-delay. Since a new key frame was just inserted at 30 frames away, the n+1th key frame will be at 30+240 -- which is outside the reservoir-frame-delay range. It therefore returns only the TUs it sees until the newly inserted key frame, so ~30 give or take, which is very little and causes a negative rate_total.

Increasing reservoir-frame-delay to 360 (i.e. 1.5x keyint) avoids the negative rate_total: The result is still "low quality", but the horrible distortion is gone.

Trying out different variants where the "reservoir isn't startled by surprise key frame" yield similar results - low quality, but not completely distorted. E.g.

-i 240 -I 240 -reservoir-frame-delay 240:

-i 360 -I 360 -reservoir-frame-delay 360:

Looking at the comment https://github.com/xiph/rav1e/blob/master/src/rate.rs#L589-L596 specifically: "but long enough to allow looking into the next GOP (avoiding the case where the last frames before an I-frame get starved)". It seems that with keyint = reservoir-frame-delay = 240 this is exactly what happens – it only checks the current GOP, which is coincidentally very short.

So we need reservoir-frame-delay > max_keyint by some suitable margin, but not too large to avoid making rate control too slow to react. Maybe reservoir-frame-delay = min(max_keyint * 1.5, max_keyint + min_keyint * 4)? For the default values of min_keyint=12 max_keyint=240 that'd be reservoir-frame-delay=288, i.e. +48 frames.

issue 2: all bits spent early

Adjusting reservoir-frame-delay is not a magic fix, unfortunately. For example, with keyint=120 reservoir-frame-delay=180 I still run into distortions, i.e. when rav1e's log_hard_limit condition triggers. Essentially the "surprise key frame insertion" has been moved back, so the rate control has more room to adapt to the change. But it's still possible for the RC to spend most/all of its bits, and being unable to react to the KF insertion. This is likely helped by sudden complexity changes.

For single pass, the reservoir_tus that affects rate_total exhibits a saw tooth pattern, i.e. spend bits early on the key frame, and conserve them towards the "last KF in reservoir". The rate then spikes when a new KF is observed in the reservoir. The simple idea here would be to be more conservative if the next (guaranteed) KF is far away, to allow more room to react to "surprising" KF insertions.

The general idea is along the lines of reservoir_tus.powf(0.9) but this can be expressed linearly instead by looking at the reservoir_frames guess. Both variants need to account for the influence of reservoir-frame-delay (defines "max") and max_keyint (defines "range" of saw tooth) have on reservoir_tus. The code for this is surprisingly verbose, but I have been unable to find a more concise variant. Something like this:

// magic value! higher = conserve more bits early on
let penalize_early_strength = 0.25;
let max = self.reservoir_frame_delay as f32;
let min = if ctx.config.max_key_frame_interval == 0 {
  0.0
} else {
  max - ctx.config.max_key_frame_interval as f32
};
// 0.0 = next KF is far; 1.0 = next KF imminent
let next_keyframe_ratio = (reservoir_frames as f32 - min) / (max - min);
let conservative_rtu = reservoir_tus as f32
  * (1.0 - penalize_early_strength * next_keyframe_ratio);
// and then use conservative_rtu instead of reservoir_tus for total_rate

this graph shows the difference of old vs proposed approach with reservoir_frame_delay=180 max_keyint=120:

tuning

Obviously these changes affect both quality and size. I tested them with some of my videos, and that's where the magic values come from. But my corpus isn't very diverse, I have checked only a few conditions/scenarios, nor would I rate my eyeballs as "trustworthy and objective measure".

Additionally, I'm definitely lacking understanding of the surrounding code and video encoding in general, so it's hard for me to come up with a sensible test strategy. Put differently, if my changes are good or not, I have no clue. Point out flaws in my line of reasoning, please.

Should I polish these proposed changes into PRs?

Recap

reservoir-frame-delay should be suitably larger than max_keyint, to give room to adapt to keyframe insertions small than max_keyint.
flatten the reservoir_tus saw tooth pattern for single pass mode to give room in case of complexity spikes.
please advise on the next steps

Increasing the reservoir-frame-delay to 360 also fixes https://github.com/xiph/rav1e/issues/2857 . It also works for keyint=120 rfd=180. It doesn't need any other of the proposed changes to be fine. Presumably these two issues are related.
There's also cap_underflow that's off by default: https://github.com/xiph/rav1e/blob/master/src/rate.rs#L1222-L1226 Enabling it also helps somewhat, since reservoir_fullness doesn't become negative anymore, and thus more quickly makes at least some bits available. It slightly increases the file size, though, but it's hard to tell from my reproducer because it's so short.

Your analysis seems correct for all I could see, but I'd defer to @barrbrain for informed opinions :)

xiph / rav1e