openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
20 stars 18 forks source link

Use VP9 in place of VP8 for WebM videos #79

Closed kelson42 closed 4 months ago

kelson42 commented 3 years ago

It offers a better compression

rgaudin commented 3 years ago

What are the drawbacks? Can you please share documents/comparison on this? I'd be happy if we could increase quality with the same target bitrate.

kelson42 commented 3 years ago

I'm not aware about any big drawback (maybe a missing native support in older browsers/library). See https://bloggeek.me/vp8-vs-vp9-quality-or-bitrate/

rgaudin commented 3 years ago

https://caniuse.com/?search=vp9 is not clear about VP9 support in browsers. Something we should definitely know before hand.

If you thought VP8 is a resource hog, then expect VP9 to be a lot more voracious with its CPU requirements

Seems to refer to encoding though. Rest of drawbacks mentioned are probably obsolete.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

danielzgtg commented 1 year ago

VP9 could still be sent to browsers without VP9 support. We would just need a polyfill. However people using those browsers are unlikely to have the CPU speed to decode VP9 smoothly, and it's even slower in a polyfill. Zoom uses a WASM polyfill for either video and/or audio. H264 isn't good for open source because it's patented.

It's 2023. I used to transcode my videos to VP9, but I transcode them to AV1 now. The compression quality compared to H264 is insane, as AV1 is 10x smaller with my settings even at 1080p. The videos I upload to GitHub these days are all AV1 in mp4 container. Again, the CPU speed of your target audience is in question. My laptop without AV1 hardware decode can decode the 8-bit AV1 natively in the browser smoothly, but not 10-bit. My current desktop has AV1 hardware decode, but my previous desktop can't keep up with its slow CPU.

rgaudin commented 1 year ago

@danielzgtg that's very interesting.

We already uses ogv.js and encode in VP9. From what I've read following your comment, AV1 hardware support is till fairly recent and would probably be difficult on most of our user's HW (although we have no numbers on our user's environment).

The AV1 bitrate quality is WOW and I'd love for us to switch. If you get a chance, I'd love an AV1 equivalent FFMPEG params list of our WebMLow preset so we could run some actual tests.

https://github.com/openzim/python-scraperlib/blob/6f93bccd2b941e76d9606972bb1d5a487ca97831/src/zimscraperlib/video/presets.py#L30-L55

danielzgtg commented 1 year ago

I reconstructed your ffmpeg command to be ffmpeg -hide_banner -i in.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v libvpx -quality best -b:v 300k -maxrate 300k -minrate 300k -qmin 30 -qmax 42 -r 24 -g 240 -codec:a libvorbis -ar 44100 -b:a 48k out.webm. That gives frame=18296 fps=104 q=33.0 Lsize= 57615kB time=00:12:42.29 bitrate= 619.2kbits/s dup=0 drop=4549 speed=4.31x video:29417kB audio:27908kB subtitle:0kB other streams:0kB global headers:3kB muxing overhead: 0.505121%. I'm testing on a video mixing live action and animated with lots of special effects.

hardware support "-vf": "scale='480:trunc(ow/a/2)*2'", # frame size

480p? Should be able to decode that on CPU, especially at low bitrates. Some browsers don't even use hardware decoding for low-resolution videos.

Also thank you for that trunc part. I had to do the same thing for my scripts, and not knowing that command I complicated things with a Python wrapper.

"-codec:a": "libvorbis", # audio codec

Have you tried upgrading to libopus? The resulting audio sounds better. The Safari support for both Opus and Vorbis have the same warnings on caniuse.com.

"-maxrate": "300k", # max video bitrate

This doesn't look right. The webm command gave me 619.2kbits/s, way over 300k. I doubt that's all because of audio. Wow, copying using -c:v copy and -an gives 317.5kbits/s. The minrate cap also seems to be wasting space. That together with qmin and qmax seem to be imposing contradictory or unsatisfied constraints.

I'd love an AV1 equivalent FFMPEG params list

I matched the quality and speed to ffmpeg -hide_banner -i in.mp4 -vf "scale='480:trunc(ow/a/2)*2',format=yuv420p10le" -c:v libsvtav1 -preset 7 -crf 35 -r 24 -g 240 -c:a libopus -b:a 48k out.mp4. It resulted in frame=18296 fps=123 q=35.0 Lsize= 29291kB time=00:12:42.29 bitrate= 314.8kbits/s dup=0 drop=4549 speed=5.11x video:24698kB audio:4211kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.324344%. Of that, 265.8kbits/s was video.

A lot of the artifacts I saw in the webm are gone. This alone shouldn't be used to measure AV1 because AV1 encoders like to smooth things out. The colors are sharper, but that might just be from the 10-bit. The definitive evidence for AV1 is that I can see faces clearly when I couldn't see some before.

Flag Effect
scale= AV1 is better than other codecs at compressing high resolutions. This reduces file size and users' screen sizes should be considered if it's desired to increase this.
format=yuv420p10le Improves compression and color quality. Remove to speed up encode and decode on some computers at the risk of introducing artifacts and increasing file size.
-preset 6 Matched to webm encoding speed. It's still faster and higher quality. Without it, I got 30.5 MiB instead of 28.6MiB but speed=24.5x instead of speed=5.11x. I use -preset 4 which is the limit for multithreading, but that is very slow.
-crf 35 Sets compression quality. libsvtav1 hasn't implemented bitrate properly yet, so I had to match the crf. 35 is actually the default setting. I use 60 for PowerPoints, and you could try 40 to see if that's acceptable. With 40, it still looked better than webm and I got 237.5kbits/s and 21.6 MiB.
-r 24 Reduces framerate to movies' 24FPS. Assuming your input is 24+ FPS, else add a check to reduce this further.
-g 240 Keyframe every 10s. libsvtav1 doesn't have scene change detection. This is specified to have the keyframe interval consistent for now. Keyframes affect video seek times.
-b:a 48k A common value for audio quality, and I can't hear a difference. With the upgrade to Opus, try lowering this.

Encode times may be a concern without a fast CPU with 16+ cores. Removing the -preset 6 makes the encode both faster and still look better than webm. I think that decode performance is tied to bitrate, so improving compression may make it easier to decode. The encoded webm, with vaapi disabled so an AVX2 CPU only, was at 5.6%. The 21.6MiB AV1 was at 6.5% CPU, and the 28.6 MiB one was at 7.2% CPU. So we achieved the goal of improving compression. If users' computers heat up more, we can tell them that it's because we improved the video quality.

kevinmcmurtrie commented 1 year ago

The commands should only set narrow min/max limits on the quantizer quality or bitrate, but not both. I'm seeing high failure rates in transcoding and I bet it's because it can't always match two constraints at once.

rgaudin commented 1 year ago

Have you tried upgrading to libopus? The resulting audio sounds better. The Safari support for both Opus and Vorbis have the same warnings on caniuse.com.

Didn't realize ogv.js supported Opus. We shall switch indeed.

Thanks for all the details ; it's a great time and quality gain for me. I will run a couple tests and try this on an actual recipe and if it turns out fine, we'll make it the default.

danielzgtg commented 1 year ago

min/max limits on the quantizer quality or bitrate, but not both I'm seeing high failure rates in transcoding

I think the encoder will just ignore contradictory options if they are supplied. At least, this is what some encoders do. If the transcoding jobs fail, it must be because of disk/network failures or running out of RAM. You could also try to remux with '-c:v copy -c:a copy' before transcoding as that fixes some files for me. Further remux options are regenerating the timestamps, or extracting into separate files then recombining the tracks.

ogv.js supported Opus

Didn't realize ogv.js supported AV1 either. With the previous 480p 8-bit encoded video, it was 35% CPU on an AVX512 CPU and 60% CPU on AVX2-only CPU.

10-bit is not supported (https://github.com/brion/ogv.js/issues/626). mp4 container is not supported, only webm (https://github.com/brion/ogv.js/issues/443).

rgaudin commented 1 year ago

Indeed ffmpeg doesn't fail. I've manually tested numerous of the failing ffmpeg commands and they all succeeded so the failures are probably resources related.

That said, it's important to fix because as you indicated options are ignored and we are thus creating larger files than we expect to.

I subscribed to those two tickets. 10b seems like the most important for us. https://github.com/brion/ogv.js/commit/eef47bf6aff9de9f456e83287e9ddac74cb91beb shows there is no support and the compilation support is not the root cause.

kevinmcmurtrie commented 1 year ago

Followup on my previous comment - I made some snapshots of the videos directory and re-checked against errors. The errors are a scraper bug that requests transcoding of files that never existed.

The conflicting quality bounds and bitrate bounds tuning does greatly degrade ffmpeg. For the quality and bitrate, only set the target quality and the maximum bitrate. It encodes much faster and slashes the file size for those whiteboard videos. (Any settings to reduce key frames or use a variable frame rate may help too.)

Also, ffmpeg is multi-threaded. Running core * ffmpeg processes results in core^2 active threads. All that context switching is very inefficient.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 1 year ago

Looks like vp9 is now natively supported on iOS/macOS https://www.theoplayer.com/blog/vp9-support-now-possible-on-apple-devices-and-all-major-platforms

rgaudin commented 1 year ago

It's still HLS-only. We'd have to support HLS on our readers to use it

kelson42 commented 5 months ago

@rgaudin tested a VP9 encoded ZIM file in Kiwix on macOS 14 and it works.

Encoding to VP9 takes significantly longer to encode than VP8 in his experiment, but the result (same quality) is a size reduction around 20%.

IMHO there is nothing stoping us to use VP9 in place of VP8 from now.

@rgaudin @kevinmcmurtrie @benoit74 Any remark?

rgaudin commented 5 months ago

Works as well on Kiwix iOS (iOS 17.4.1)

benoit74 commented 5 months ago

I think we've still not answered the major concerns around this move:

And in both cases, are the corresponding requirements acceptable, especially given our commitments to support some of our core clients having to support tablets for xx years?

Where is the test ZIM so that I can at least try on our beloved tablets?

And finally, what about HLS? Are we sure we do not want to wait for HLS implementation in readers to make the move?

rgaudin commented 5 months ago

https://tmp.kiwix.org/ci/tests_fr_reg-videos_2024-05.zim

Note that this VP9 video has been encoded with -c:v libvpx-vp9 instead of -c:v libvpx without any other change. The time it took to encode this 14mn (90mn!!) seems to indicate we need better ffmpeg args for this. We just wanted to know if the WKWebView in the Kiwix Apple reader supported VP9.

kelson42 commented 5 months ago

which browser versions started to support VP9 natively?

https://en.wikipedia.org/wiki/VP9#Browser_support

kelson42 commented 5 months ago

do we have the proper polyfill working correctly when VP9 is not natively supported? is VP9 acceptable in terms of decoding "power" needed (natively / with polyfill), especially on low-end devices like phones and tablets?

VP9 is supported since a very long time, but in worse case ojv.js will play the role of decoding

kevinmcmurtrie commented 5 months ago

@rgaudin tested a VP9 encoded ZIM file in Kiwix on macOS 14 and it works.

Encoding to VP9 takes significantly longer to encode than VP8 in his experiment, but the result (same quality) is a size reduction around 20%.

IMHO there is nothing stoping us to use VP9 in place of VP8 from now.

@rgaudin @kevinmcmurtrie @benoit74 Any remark?

https://github.com/openzim/zimfarm/issues/754 would be helpful so that GPU devices can be exposed to docker containers. Even an integrated GPU's driver might boost FFmpeg a bit.

benoit74 commented 5 months ago

https://www.webmproject.org/about/faq/ is an important source of information

Especially entry below which mostly answers my concerns regarding VP9 support:

image

I don't know how true this expectation holds, but it means that at least theoretically a transition from VP8 to VP9 should be transparent for end-users

benoit74 commented 5 months ago

Videos of the test ZIM are working well (playing, seeking, fullscreen) on the "recent" tablet from Orange: Chrome 87, Android 11

All except the VP9 are also working well on the "old" tablet from Orange: Chrome 43, Android 5.1.1. On this tablet, the same VP9 video is however playing in MX Player (which was already installed on my table, not sure if it is installed by default or was added by a previous user) with HW decoder ; seeking seems rather erratic, but still working after waiting a bit, so quite sure it is more a problem in how MX Player handles this for web flows.

For me this proves that VP9 support is sufficient to make the transition.

Did someone already understood (at least a bit) what would be the proper ffmepg settings for vp9 encoding? I intend to have a look into it otherwise.

rgaudin commented 5 months ago

Did someone already understood (at least a bit) what would be the proper ffmepg settings for vp9 encoding? I intend to have a look into it otherwise.

Might be a good opportunity to look at hardware acceleration. CPU sure is simple and generic but it's also crazy slow compared to HW-accelerated encoding (CPU or GPU)

benoit74 commented 5 months ago

I'm not sure hardware acceleration for VP9 encoding is a thing to have a look into.

Wikipedia mention it has never taken off and other sources in Google Search and ChatGPT seems to corroborate the thing.

Encoders are usually tight to a specific hardware (NVidia, ...). We have no idea about which kind of graphic chipset would be available on our workers.

And Zimfarm is not yet capable to pass proper option for GPU access from inside the Docker container.

Given all this, is it really relevant to spend time on hardware encoding for VP9?

rgaudin commented 5 months ago

Most recent CPUs include acceleration for it from what I read

benoit74 commented 5 months ago

I found few interesting online sources around setting up VP9 encoder correctly (this is not an exhaustive review of existing sources):

I did few tests around these VP9 recommendations (mostly centered around google recommendation for VOD, i.e. with 2 passes) and unfortunately they were not conclusive: final video was bigger than VP8 (using v2 encoder presets from scraperlib) and longer to encode on my Mac M1 Pro with https://tmp.kiwix.org/ci/test-videos/ted-fast-movements/ted-the-trick-to-regaining-your-childlike-wonder-zack-king-raw.mp4 video.

VP8 args:

-codec:v libvpx -quality best -b:v 128k -qmin 18 -qmax 40 -vf scale='480:trunc(ow/a/2)*2' -an

VP9 args:

pass1: 
-codec:v libvpx-vp9 -b:v 150k -minrate 75k -maxrate 218k -tile-columns 0 -g 240 -threads 2 -quality good -crf 37 -vf scale='480:trunc(ow/a/2)*2' -an -pass 1 -speed 4

pass2:
-codec:v libvpx-vp9 -b:v 150k -minrate 75k -maxrate 218k -tile-columns 0 -g 240 -threads 2 -quality good -crf 37 -vf scale='480:trunc(ow/a/2)*2' -an -pass 2 -speed 1

My feeling now is that migrating to VP9 will deserve significant effort to find proper settings AND because we will need to adapt scrapers / presets to 2 pass (which seems significantly recommended).

For anyone willing to help, videos to use for tests are located at https://tmp.kiwix.org/ci/test-videos/ (we have 3 videos, use the raw.mp4 on all 3 folders).

Target size is -vf scale='480:trunc(ow/a/2)*2' (unless this proves to be significantly problematic for VP9)

kevinmcmurtrie commented 5 months ago

Constrained Quality mode seems to not work. It follows the requested average bitrate so tightly that it's essentially CBR. Options crf, minrate, and maxrate do nothing except at extreme nonsense values. The q thrashes beyond sane values while encoding.

I'm not sure if it's broken or if there are other considerations not documented.

kevinmcmurtrie commented 5 months ago

The average bitrate config in VP9 seems to have an extremely tiny window for measurement. It can fit an I-frame to reduce flicker, but high motion frames can't borrow from low motion frames. 2-pass doesn't impress me.

A workaround I found is to pretend like the average bitrate doesn't exist; you can only set the limit using the b:v option.

ffmpeg -i infile.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 600k -minrate 0 -maxrate 1200k -crf 30 -g 240 -quality good -speed 0 -auto-alt-ref 1 -lag-in-frames 25 -undershoot-pct 100 -overshoot-pct 100 -codec:a libvorbis -b:a 48k outfile-b.webm

This produces a 12.6 MB file from the TED sample in 1 pass that's pretty clean. I'm not even sure if the maxrate and overshoot-pct here does much. It seems to reduce some flicker, but I'm not sure.

benoit74 commented 5 months ago

Thank you ! Unfortunately, 12.6 MB is absolutely equivalent to VP8 size, so what would be the advantage to migrate to VP9? Do we have a better perceived quality?

kevinmcmurtrie commented 5 months ago

Good news! I tried the VP8 params with VP9 and they work. Omitting -crf results in -qmin and -qmax working again for proper constrained quality encoding. I tuned the min/max quality and target bitrate to closely match the VP8 visual quality.

Existing VP8 params: ffmpeg -i infile.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v libvpx -b:v 128k -qmin 18 -qmax 40 -codec:a libvorbis -b:a 48k outfile-vp8.webm TED: 15.2 MB, Khan whiteboard: 1.5 MB.

VP9 quality match: ffmpeg -i infile.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 140k -qmin 30 -qmax 40 -quality good -speed 0 -auto-alt-ref 1 -lag-in-frames 25 -minrate 0 -maxrate 1200k -codec:a libvorbis -b:a 48k outfile-vp9.webm TED: 8.5 MB, Khan whiteboard: 1.3 MB.

The bad news is that VP9 is really slow. Bumping the speed to 1 makes it 50% faster with the cost of TED being 8.6 MB.

benoit74 commented 5 months ago

Thank you @kevinmcmurtrie

I did test your last settings, and they seem a bit promising even if encoding time is very significantly longer than VP8. I tested various speed and it looks to me that -speed 4 could even be acceptable. Did you noticed a significant drop in quality when using such a "high speed"?

Settings tested:

Setting name Command line
VP8 settings 2 ffmpeg -i input.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v libvpx -b:v 128k -qmin 18 -qmax 40 -codec:a libvorbis -b:a 48k -ar 44100 output.webm
VP9 settings 1 ffmpeg -i input.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 140k -qmin 30 -qmax 40 -quality good -speed 0 -auto-alt-ref 1 -lag-in-frames 25 -minrate 0 -maxrate 1200k -codec:a libvorbis -b:a 48k -ar 44100 output.webm
VP9 settings 2 ffmpeg -i input.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 140k -qmin 30 -qmax 40 -quality good -speed 1 -auto-alt-ref 1 -lag-in-frames 25 -minrate 0 -maxrate 1200k -codec:a libvorbis -b:a 48k -ar 44100 output.webm
VP9 settings 3 ffmpeg -i input.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 140k -qmin 30 -qmax 40 -quality good -speed 2 -auto-alt-ref 1 -lag-in-frames 25 -minrate 0 -maxrate 1200k -codec:a libvorbis -b:a 48k -ar 44100 output.webm
VP9 settings 4 ffmpeg -i input.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 140k -qmin 30 -qmax 40 -quality good -speed 4 -auto-alt-ref 1 -lag-in-frames 25 -minrate 0 -maxrate 1200k -codec:a libvorbis -b:a 48k -ar 44100 output.webm

Encoding time on a "random" Linux server (2 * AMD Ryzen 5 3600 6-Core Processor):

Video Duration VP8 settings 2 VP9 settings 1 VP9 settings 2 VP9 settings 3 VP9 settings 4
khan-board-drawing 3:04 0:08 0:50 0:34 0:32 0:26
mit-open-learning 9:24 1:18 9:13 4:30 3:51 3:02
ted-the-trick-to... 7:59 1:15 8:56 4:21 3:33 3:11

Resulting video sizes:

Video Raw VP8 settings 2 VP9 settings 1 VP9 settings 2 VP9 settings 3 VP9 settings 4
khan-board-drawing 1.1M 1.4M 1.3M 1.3M 1.3M 1.4M
mit-open-learning 72M 10M 6.5M 6.5M 6.5M 6.7M
ted-the-trick-to... 69M 12M 8.1M 8.2M 8.3M 8.4M

A ZIM is available for testing at https://tmp.kiwix.org/ci/test-videos/tests_en_bbe-vp9-tests_2024-06.zim (note that there is no video.JS player on an HTML page, but just the raw video on a link). Tested OK on latest Kiwix-apple testflight build 3.4.0 (162) on MacOS, besides some issues around fullscreen but I believe this is expected since Testflight build is a bit old.

benoit74 commented 5 months ago

I also tried to consider hardware encoding. Some people seems to report up to 10x faster encoding with VP9. Most popular option seems to rely on Intel CPUs which embed a VP9 encoder since Ice Lake architecture (just need to check Quick Sync video features are available). Unfortunately my bare metal Intel CPUs are too old ... and it looks like it doesn't work well on virtualization. Not sure if we should invest a bit to dig this hole or not.

benoit74 commented 5 months ago

Discussed today with @kelson42, we consider the price to pay for VP9 settings 4 is acceptable: significant reduction of video size at "only" 3x longer encoding.

@kevinmcmurtrie thank you very much !

I just realize that we need to tune as well the "High" presets, which are currently ffmpeg -i input.mp4 -c:v libvpx -b:v 0 -crf 25 -codec:a libvorbis output.webm. Reusing them as-is for VP9 just doesn't work, the final file is pretty big (at least on mit-open-learning video). @kevinmcmurtrie, would you mind to help us on tuning this as well for VP9?

kevinmcmurtrie commented 4 months ago

I just realize that we need to tune as well the "High" presets, which are currently ffmpeg -i input.mp4 -c:v libvpx -b:v 0 -crf 25 -codec:a libvorbis output.webm. Reusing them as-is for VP9 just doesn't work, the final file is pretty big (at least on mit-open-learning video). @kevinmcmurtrie, would you mind to help us on tuning this as well for VP9?

I'm not sure what the target quality is because those parameters look awful for TED. The chroma is glitching like a distress call in a sci-fi movie, yet it's still 31.7 MB.

If you want a bit smaller and looks pretty good: ffmpeg -i input.mp4 -c:v vp9 -b:v 340k -qmin 26 -qmax 54 -g 240 -quality good -speed 0 -minrate 0 -maxrate 1800k -codec:a libvorbis -b:a 48k output.webm

23.1 MB for TED 1.4 MB Khan whiteboard

On a AMD 7950X, TED encoding speed was about 50% realtime with -speed 0 and 90% at -speed1. That's not bad for one video but it's painful for encoding a whole archive. An option might be to declare these scrapers to be "9" CPU units and encode 5 videos in parallel. (I'd prefer that scrapers finish in 14 days. Pixelmemory is a slightly slower AMD 7900X3D.)

Side note: I messed up A/B testing earlier. -auto-alt-ref 1 -lag-in-frames can be dropped for all of these 1-pass encodings. The open source project documentation says it only works in 2-pass mode. New tests showed it did nothing in 1-pass mode. The "High" profile compression is kind of slow so my A/B testing isn't exhaustive yet.

benoit74 commented 4 months ago

Thank you ! I like the fact that we have similar settings between low and high quality presets. I agree about the fact that even 90% is painfully slow. Luckily this high quality preset is rarely used (we usually favor smaller ZIMs), and as you mentioned we should probably change our habits in terms of parallel encoding videos since VP9 is significantly different than VP8.

I'd prefer that scrapers finish in 14 days

This is our target as well, less than 14 days whenever possible, never more than 30 days. Note that after the huge reencoding pass will be done, all reencoded videos will be cached on S3 and we will reencode only new videos since the last run, so scraper should be "quite fast" (all the rest of the logic still needs to be executed).

I messed up A/B testing earlier

No worries, well noted.

The "High" profile compression is kind of slow so my A/B testing isn't exhaustive yet.

Let's wait for completion, we want to move this forward, but it is better to take few more days to be certain about the proper settings rather than have to come back and change the settings in few weeks/months and reencode again all videos.

kevinmcmurtrie commented 4 months ago

ffmpeg -i infile.mp4 -c:v vp9 -b:v 340k -qmin 26 -qmax 54 -g 240 -quality good -speed 0 -codec:a libvorbis -b:a 48k outfile.webm seems to be all that's needed for HQ. Lots of other options aren't hooked up or have negligible changes.

That would make the LQ version ffmpeg -i infile.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 140k -qmin 30 -qmax 40 -g 240 -quality good -speed 0 -codec:a libvorbis -b:a 48k outfile.webm

I played with the audio a bit. A possible option for both is to replace -b:a 48k with -q:a 0 to get a variable bitrate.

Edit: -g 240

benoit74 commented 4 months ago

@kevinmcmurtrie Thank you, I'm currently doing some validations of these two settings. Do you consider that -speed 0 is mandatory? At least for the LQ, we are about to decide that -speed 4 makes more sense since file size is not significantly different and encoding speed is significantly smaller. But I'm struggling to ensure it has no impact on visual quality, did you noticed something? I suspect we might use the same speed for HQ.

benoit74 commented 4 months ago

A new test ZIM is available at https://tmp.kiwix.org/ci/test-videos/tests_en_bbe-vp9b-tests_2024-06.zim

For me the results of HQ with -speed 4 are not acceptable, quality is significantly degraded compared to -speed 1.

Regarding the audio, I do not hear much difference or see much size changes. I suggest to keep using -codec:a libvorbis -b:a 48k -ar 44100

I hence suggest we should use:

I will open a PR with that

benoit74 commented 4 months ago

More details in this issue for the "posterity" and for the lazy ones who won't open the ZIM.

Settings tested:

Setting name Command line
Reference VP8 High ffmpeg -i video-raw.mp4 -c:v libvpx -b:v 0 -crf 25 -codec:a libvorbis video-vp8-high.webm
VP9 High settings 1 ffmpeg -i video-raw.mp4 -c:v libvpx-vp9 -b:v 340k -qmin 26 -qmax 54 -g 240 -quality good -speed 0 -codec:a libvorbis -b:a 48k video-vp9-high-v1.webm
VP9 High settings 2 ffmpeg -i video-raw.mp4 -c:v libvpx-vp9 -b:v 340k -qmin 26 -qmax 54 -g 240 -quality good -speed 1 -codec:a libvorbis -b:a 48k video-vp9-high-v1.webm
VP9 High settings 3 ffmpeg -i video-raw.mp4 -c:v libvpx-vp9 -b:v 340k -qmin 26 -qmax 54 -g 240 -quality good -speed 4 -codec:a libvorbis -b:a 48k video-vp9-high-v1.webm
Reference VP8 Low ffmpeg -i video-raw.mp4 -c:v libvpx-vp9 -b:v 340k -qmin 26 -qmax 54 -g 240 -quality good -speed 1 -codec:a libvorbis -b:a 48k video-vp9-high-v2.webm
VP9 Low settings 1 ffmpeg -i video-raw.mp4 -c:v libvpx-vp9 -vf "scale='480:trunc(ow/a/2)*2'" -b:v 140k -qmin 30 -qmax 40 -g 240 -quality good -speed 0 -codec:a libvorbis -b:a 48k -ar 44100 video-vp9-low-v1.webm
VP9 Low settings 4 ffmpeg -i video-raw.mp4 -c:v libvpx-vp9 -vf "scale='480:trunc(ow/a/2)*2'" -b:v 140k -qmin 30 -qmax 40 -g 240 -quality good -speed 4 -codec:a libvorbis -b:a 48k -ar 44100 video-vp9-low-v4.webm
Audio settings 1 ffmpeg -i video-raw.mp4 -c:v copy -codec:a libvorbis -b:a 48k -ar 44100 video-audio-v1.mp4
Audio settings 2 ffmpeg -i video-raw.mp4 -c:v copy -codec:a libvorbis -b:a 48k video-audio-v2.mp4
Audio settings 3 ffmpeg -i video-raw.mp4 -c:v copy -codec:a libvorbis -q:a 0 -ar 44100 video-audio-v3.mp4
Audio settings 4 ffmpeg -i video-raw.mp4 -c:v copy -codec:a libvorbis -q:a 0 video-audio-v4.mp4

Encoding time on a "random" Linux server (2 * AMD Ryzen 5 3600 6-Core Processor):

khan-board-drawing (originally 1.1M, 3 min 4 secs)

Setting Resulting file size Encoding duration
Reference VP8 High 2.0M 0 min 11 secs
VP9 High settings 1 1.3M 1 min 22 secs
VP9 High settings 2 1.3M 0 min 42 secs
VP9 High settings 3 1.3M 0 min 26 secs
Reference VP8 Low 1.4M 0 min 7 secs
VP9 Low settings 1 1.2M 0 min 43 secs
VP9 Low settings 4 1.2M 0 min 16 secs
Audio settings 1 1.4M 0 min 1 secs
Audio settings 2 1.4M 0 min 1 secs
Audio settings 3 1.4M 0 min 1 secs
Audio settings 4 1.4M 0 min 1 secs

mit-open-learning (originally 72M, 9 min 24 secs)

Setting Resulting file size Encoding duration
Reference VP8 High 21M 3 min 36 secs
VP9 High settings 1 13M 22 min 43 secs
VP9 High settings 2 13M 10 min 46 secs
VP9 High settings 3 13M 7 min 04 secs
Reference VP8 Low 10M 1 min 18 secs
VP9 Low settings 1 5.6M 8 min 59 secs
VP9 Low settings 4 5.8M 2 min 15 secs
Audio settings 1 66M 0 min 6 secs
Audio settings 2 66M 0 min 6 secs
Audio settings 3 67M 0 min 7 secs
Audio settings 4 67M 0 min 7 secs

ted-the-trick-to-regaining-your-childlike-wonder-zack-king (originally 69M, 7 min 59 secs)

Setting Resulting file size Encoding duration
Reference VP8 High 30M 4 min 52 secs
VP9 High settings 1 22M 33 min 06 secs
VP9 High settings 2 23M 7 min 59 secs
VP9 High settings 3 23M 8 min 29 secs
Reference VP8 Low 12M 1 min 13 secs
VP9 Low settings 1 7.7M 8 min 35 secs
VP9 Low settings 4 7.9M 2 min 5 secs
Audio settings 1 64M 0 min 5 secs
Audio settings 2 64M 0 min 5 secs
Audio settings 3 65M 0 min 6 secs
Audio settings 4 65M 0 min 6 secs