Open shssoichiro opened 3 years ago
Refs #845
IIRC, the biggest reason that rav1e is slower than aomenc is because aomenc does a massive amount of search space pruning at the higher speed preset, particularly when it comes to motion estimation.
You can see it in low motion clips vs high motion clips: aomenc and rav1e at speed 6 are similar in high motion clips in terms of speed and visual quality, but once a low motion scenes comes in, aomenc speeds up a lot more than rav1e. rav1e has no search pruning in any manner.
That pruning also applies to block size selection and transform size partitioning, especially with rectangular partitions: at speed 6, they restriction partition selection from 8x8-32x32 transforms.
Another factor is that rav1e's scene-detection and frame type selection is fully done during the encoding process. aomenc does this as well, but not as heavily as it can rely on its default 1st pass to do a lot of the heavy lifting. That why using the no-scene-detection flag with a very fast external scene-detection program or with master-of-zen's work(which should be merged IMO) nicely speeds up the encoder.
And last of all, >CPU-5 disables all loop restoration in aomenc. That alone gives it an absolutely massive speed boost at 50-70% average encoding framerates at a cost to metrics.
Finally, default aomenc parameters in video coding tends to favor artifact prevention over raw detail and psycho-visual optimizations, which means most metrics usually prefer aomenc over rav1e's performance.
All in all, some suggestions:
That is all from me, for now.
I think we'll need some sort of solution to solve the vastly different luma/chroma balance if we want to benchmark ourselves against libaom. I'd rather not just tune the balance to win the benchmark, but rather change our benchmark for this particular case, e.g. run on grayscale, or have a special tune option specifically choosing quantizers similar to libaom.
From what I can see on the case of dark720, it looks like the source is very noisy and aomenc smooths out the noise more than rav1e, resulting in a significantly smaller file. So in this case, rav1e is producing a file that is closer to the original, but at a much higher filesize, which is not good for BD Rate. Not immediately sure what the solution for that case is.
In my tests partition_range
had huge impact on speed, so I think fast heuristics for block split will be very helpful.
I suggest being careful with luma/chroma balance, because visual metrics usually handle color badly. I'm not entirely sure, but I think libvmaf ignores color entirely for SSIM. The SSIM algorithm has a luminance component, so it would be absurd if applied to Cb/Cr channels.
If you're going to change color balance, verify with butteraugli at very high bitrates. My DSSIM should be OK too, especially at lower bitrates (it does SSIM without the luminance component when comparing color).
@BlueSwordM took the words right out my mouth
There's also another very important factor to take into account when comparing aomenc and SVT-AV1 against rav1e: unless I missed something while parsing the code, rav1e never voluntarily denoises the input.
aomenc and SVT-AV1 use temporal denoising on the input over some types of frames, with aomenc giving specific control over it with arnr-strength=X
, with a range of 0-6, with 5 being the default.
I've yet to do an AWCY run detailing what happens when you disable ARNR denoising entirely, but from my subjective and anecdotal tests, it can have a large impact on quality, speed and metrics, especially in some hard content like video games.
Just a tip.
So basically, one of the 1st steps we should do to improve quality is implement the full set of CDEF search strengths.
The current method, which is picking CDEF strength from the current quantizer(so CDEF Pick from Q) is good for higher fidelity encoding, but certainly not optimal for keeping clean edges at lower bitrates.
However, the full set of CDEF search strength is a bit problematic for fidelity, as it can result in slight blurring in high frequency AC blocks(hair, skin, grass, noise, etc).
Therefore, my idea would be to separate the CDEF tuning 2 categories:
Furthermore, since CDEF can actually hurt fidelity when a lot of noise is present, a simple noise estimation algorithm could be used to disable CDEF filtering once enough noise reaches the threshold(also based on quantizer somewhat).
A couple of items that came up today:
I went through the task list and found several items that are tagged compression performance
that seem to referencing tools that aren't implemented yet. This might account for some of the delta, too. It might be valuable to triage these based on their potential.
@shssoichiro Do you think it would be a good idea to pin this issue? It seems pretty important, IMO.
Good idea considering this is a meta issue gathering basically "the most important" features we need to add. tbh I didn't even know that pinning issues was a thing in Github, unless it's something they added recently.
Good idea considering this is a meta issue gathering basically "the most important" features we need to add. tbh I didn't even know that pinning issues was a thing in Github, unless it's something they added recently.
Maybe? I've seen it on some other repos, but I can't really say when they started popping up.
Thanks for pinning and replying!
Sorry for unpinning 😅 I accidentally clicked the button, I pinned it back
I wanted to create a meta issue to track features or changes we can implement to reach quality parity with aomenc. Right now, our speed 6 still trails aomenc's cpu-used 6 by about 30% BD-rate, while also being slower (assuming one tile and no parallel encoding) (AWCY).
We come closer if we up rav1e to s0 (AWCY), where some clips even win over aomenc, but at the cost of rav1e being 2700% slower.
There's also the notable outlier of dark720, which is 200% worse MSSSIM BD Rate even at speed 0.
Here are the ideas so far:
[ ] #845
[ ] Implement search pruning
[ ] Implement wiener filtering
[ ] Quantization matrices (#2973)
[ ] Delta-Q?
[x] #2710
[x] #1308
[ ] Alt-ref frame denoising?
[ ] #1734
[ ] #1729
[ ] #1722
[ ] #1726
[ ] #1731
[ ] #1730
[ ] #1732
TODO: Try to fill out this list with more suggestions. I'm not extremely knowledgeable on the aomenc code base, and it's massive, so I'm preferring to welcome discussion from people who may be more knowledgeable.