Open breunigs opened 7 months ago
From IRC:
# is the issue present with higher bitrate?
NO rav1e raw9.yuv --speed 10 --bitrate 5000 -o 73.ivf --skip 50
# is the issue present when using multi-pass?
NO rav1e raw9.yuv --speed 10 --bitrate 500 -o 74_first.ivf --skip 50 --first-pass 74.stats
NO rav1e raw9.yuv --speed 10 --bitrate 500 -o 74_second.ivf --skip 50 --second-pass 74.stats
NO rav1e raw9.yuv --speed 10 --bitrate 500 -o 75_first.ivf --first-pass 75.stats
I have also failed to find a commit when this bug was introduced. Either it's been there "mostly from the beginning", or it's in some dependency that I fail to recompile/pull the old version for. I gave up trying to make versions older than 8b19a94ee835571e559869d4f63e7e723adce0a2 build, but as far as I can tell the bug is already present there. Note that I had to
cargo update -p regex@1.10.4 --precise 1.3.9
cargo update -p backtrace@0.3.71 --precise 0.3.49
at some point to keep the build working in a debian:stable docker container due to the Cargo.lock
not being present for some period, I think. So low confidence that I didn't make a bisecting mistake.
Can you reproduce the bug with the quantizer
mode instead of bitrate?
Can you reproduce the bug with the quantizer mode instead of bitrate?
rav1e 0.7.1 from Debian
X = chosen quantizer level
rav1e raw9.yuv --speed 10 --bitrate 500 --skip 50 --quantizer X -o qX.ivf
↓ rav1e raw9.yuv --speed 10 --skip 50 --quantizer X -o pX.ivf
↓
NO q1 NO p1
NO q11 NO p11
NO q21 NO p21
NO q31 NO p31
NO q41 NO p41
NO q51 NO p51
NO q61 NO p61
NO q71 NO p71
NO q81 NO p81
NO q91 NO p91
NO q101 NO p101
NO q111 NO p111
NO q121 NO p121
NO q131 NO p131
NO q141 NO p141
NO q151 NO p151
NO q161 NO p161
NO q171 NO p171
NO q181 NO p181
NO q191 NO p191
NO q201 NO p201
NO q211 NO p211
NO q221 NO p221
NO q231 NO p231
NO q241 NO p241
NO q251 NO p251
NO q252 NO p252
NO q253 NO p253
NO q254 NO p254
YES q255 NO p255
Playing around a bit, I suspect this is a rate control issue that occurs when rav1e inserts a key frame early (before max keyframe interval) and reservoir-frame-delay <= keyint
. Since by default keyint=240
and reservoir-frame-delay=min(240, 1.5*keyint)=240
latter is true unless rav1e's defaults are changed.
Let's look at some graphs. Here's with rav1e's default settings (i.e. -i 12 -I 240 --reservoir-frame-delay 240
):
These metrics have been grabbed from https://github.com/xiph/rav1e/blob/c7c72b5530e391211c5d5f32b16394d1c7dc00cc/src/rate.rs#L890-L897 and https://github.com/xiph/rav1e/blob/c7c72b5530e391211c5d5f32b16394d1c7dc00cc/src/rate.rs#L1227-L1228 . The x-axis are whenever the code-path in question got called, so roughly matches video progress. y-axis are either bits for the top two graphs, or trans form units for the bottom one.
We can see that rate_total
("rate_total is the total bits available over the next reservoir_tus TUs") becomes negative at some point. This matches the period of distorted video. We can also see that shortly before the real_bits
(just bits
in rav1e's code) are much larger than the estimate – this is where the "calm period" in my video ends, and the "action" starts. The reservoir empties accordingly. However, as far as I can tell that is not the (only?/main?) reason for the rate_total
to become negative, since fiddling with the rate_bias
(not graphed) calculation keeps the reservoir full enough, but the rate_total
will still be negative.
The interesting bit seems to be the reservoir_tus
which changes abruptly, which makes the rate_total
negative. Basically, as far as rate control is concerned there are no more bits to spend, and none are coming until the next key frame. So it starves the last frames before the next key frame.
The dip in reservoir_tus
is triggered by insertion of a new key frame shorter than the max key frame interval. In my case that's some 30 frames or so ahead. But that doesn't matter much, since guess_frame_subtypes
will only consider up to the last keyframe within reservoir-frame-delay
. Since a new key frame was just inserted at 30 frames away, the n+1th key frame will be at 30+240 -- which is outside the reservoir-frame-delay
range. It therefore returns only the TUs it sees until the newly inserted key frame, so ~30 give or take, which is very little and causes a negative rate_total
.
Increasing reservoir-frame-delay
to 360 (i.e. 1.5x keyint) avoids the negative rate_total
:
The result is still "low quality", but the horrible distortion is gone.
Trying out different variants where the "reservoir isn't startled by surprise key frame" yield similar results - low quality, but not completely distorted. E.g.
-i 240 -I 240 -reservoir-frame-delay 240
:
-i 360 -I 360 -reservoir-frame-delay 360
:
Looking at the comment https://github.com/xiph/rav1e/blob/master/src/rate.rs#L589-L596 specifically: "but long enough to allow looking into the next GOP (avoiding the case where the last frames before an I-frame get starved)". It seems that with keyint = reservoir-frame-delay = 240
this is exactly what happens – it only checks the current GOP, which is coincidentally very short.
So we need reservoir-frame-delay > max_keyint
by some suitable margin, but not too large to avoid making rate control too slow to react. Maybe reservoir-frame-delay = min(max_keyint * 1.5, max_keyint + min_keyint * 4)
? For the default values of min_keyint=12 max_keyint=240
that'd be reservoir-frame-delay=288
, i.e. +48
frames.
Adjusting reservoir-frame-delay
is not a magic fix, unfortunately. For example, with keyint=120 reservoir-frame-delay=180
I still run into distortions, i.e. when rav1e's log_hard_limit
condition triggers. Essentially the "surprise key frame insertion" has been moved back, so the rate control has more room to adapt to the change. But it's still possible for the RC to spend most/all of its bits, and being unable to react to the KF insertion. This is likely helped by sudden complexity changes.
For single pass, the reservoir_tus
that affects rate_total
exhibits a saw tooth pattern, i.e. spend bits early on the key frame, and conserve them towards the "last KF in reservoir". The rate then spikes when a new KF is observed in the reservoir. The simple idea here would be to be more conservative if the next (guaranteed) KF is far away, to allow more room to react to "surprising" KF insertions.
The general idea is along the lines of reservoir_tus.powf(0.9)
but this can be expressed linearly instead by looking at the reservoir_frames
guess. Both variants need to account for the influence of reservoir-frame-delay
(defines "max") and max_keyint
(defines "range" of saw tooth) have on reservoir_tus
. The code for this is surprisingly verbose, but I have been unable to find a more concise variant. Something like this:
// magic value! higher = conserve more bits early on
let penalize_early_strength = 0.25;
let max = self.reservoir_frame_delay as f32;
let min = if ctx.config.max_key_frame_interval == 0 {
0.0
} else {
max - ctx.config.max_key_frame_interval as f32
};
// 0.0 = next KF is far; 1.0 = next KF imminent
let next_keyframe_ratio = (reservoir_frames as f32 - min) / (max - min);
let conservative_rtu = reservoir_tus as f32
* (1.0 - penalize_early_strength * next_keyframe_ratio);
// and then use conservative_rtu instead of reservoir_tus for total_rate
this graph shows the difference of old vs proposed approach with reservoir_frame_delay=180 max_keyint=120
:
Obviously these changes affect both quality and size. I tested them with some of my videos, and that's where the magic values come from. But my corpus isn't very diverse, I have checked only a few conditions/scenarios, nor would I rate my eyeballs as "trustworthy and objective measure".
Additionally, I'm definitely lacking understanding of the surrounding code and video encoding in general, so it's hard for me to come up with a sensible test strategy. Put differently, if my changes are good or not, I have no clue. Point out flaws in my line of reasoning, please.
Should I polish these proposed changes into PRs?
reservoir-frame-delay
should be suitably larger than max_keyint
, to give room to adapt to keyframe insertions small than max_keyint
.reservoir_tus
saw tooth pattern for single pass mode to give room in case of complexity spikes.cap_underflow
that's off by default:
https://github.com/xiph/rav1e/blob/master/src/rate.rs#L1222-L1226
Enabling it also helps somewhat, since reservoir_fullness
doesn't become negative anymore, and thus more quickly makes at least some bits available. It slightly increases the file size, though, but it's hard to tell from my reproducer because it's so short.Your analysis seems correct for all I could see, but I'd defer to @barrbrain for informed opinions :)
Describe the bug When I am encoding https://www.breunig.xyz/share/2024-04-02/raw9.yuv (150 MB), at around the 12 second mark rav1e produces distorted blocks like so: When watching the encoded video, it seems as if the video jumps backwards/forwards by a couple of frames, creating a jarring motion. Put differently, this doesn't appear to be just low bitrate "needs more jpeg" effect. I have uploaded a broken example encode here: https://www.breunig.xyz/share/2024-04-02/70.ivf
From my testing, it appears that rav1e runs into rate control issues at the affected frames. Changing parameters slightly makes the "jarring motion bug" go away, but the "extremely low bitrate" issue stays. For example, if I set
--skip 200
the motion looks fine, but it retains the short section of more-than-expected jpeg artifacts: https://www.breunig.xyz/share/2024-04-02/72.ivfFWIW, this is not the only video I have observed the issue with, but I haven't done the work to trim down the reproducer there. However, the general video content is the same: coming from a dark area that is relatively static (i.e. the elevator moving up) to a bright area (i.e. outside) with lots of motion ("zoom-in" as I go forward + left/right pans due to making turns).
I haven't tested all combinations, but see below for the ones I have with a reasonably short input video.
To Reproduce
Expected behavior
Required Information For the custom built version:
Operating system (for the detailed tests shown above):