openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
18 stars 16 forks source link

Fix VP8 low bitrate #122

Closed kevinmcmurtrie closed 4 months ago

kevinmcmurtrie commented 5 months ago

Edits by @benoit74

Enabler for https://github.com/openzim/kolibri/issues/75 Fix #123

Changes

Original comment

https://github.com/openzim/kolibri/issues/75

The VP8 values appear to have been copied from the x264 options but these codecs do not share the same scale.

Removed contradictory options. Use ordinary average bitrate target with min/max bounds on quality.

benoit74 commented 5 months ago

Thank you for this contribution!

rgaudin commented 5 months ago

Also I find it odd to not only fix the problematic flags but to change the target bitrate without mentioning it 🤓

kevinmcmurtrie commented 5 months ago

Also I find it odd to not only fix the problematic flags but to change the target bitrate without mentioning it 🤓

The original bitrate isn't real because the numbers were impossible. It was actually < 200 kbps padded up to 300kbps. That goes into a compressed ZIM and the padding shrinks. My new value of 200 kbps might actually be higher. https://github.com/openzim/kolibri/issues/75#issuecomment-1841381604

It needs some testing. I meant for this to be a draft PR but clicked it wrong.

kevinmcmurtrie commented 5 months ago

https://github.com/openzim/python-scraperlib/assets/50348022/b2dd151c-c453-4c0f-8a09-00b0a5afd125

Why Letters(new).webm Why Letters(old).webm

kevinmcmurtrie commented 5 months ago

https://pixelmemory.us/~mcmurtri/Earthquakes%20and%20Social%20Media.mp4 https://pixelmemory.us/~mcmurtri/Earthquakes%20and%20Social%20Media(old).webm https://pixelmemory.us/~mcmurtri/Earthquakes%20and%20Social%20Media(new).webm

https://pixelmemory.us/~mcmurtri/Video%2011_%20An%20Introduction.mp4 https://pixelmemory.us/~mcmurtri/Video%2011_%20An%20Introduction(old).webm https://pixelmemory.us/~mcmurtri/Video%2011_%20An%20Introduction(new).webm

rgaudin commented 5 months ago

This looks very good @kevinmcmurtrie ; thank you.

kelson42 commented 5 months ago

I don't have an eye for this kind of things but to me it looks like no loss in quality for a big gain in size. I love it. I can not wait to have our main video based scrapers released with this new configuration. @kevinmcmurtrie Thank you very much.

benoit74 commented 5 months ago

@kevinmcmurtrie let me check this QA issue, never mind

benoit74 commented 5 months ago

@rgaudin any idea why codecov upload is failing in qa job and why 3.6 and 3.7 jobs are still expected while they have been removed from the CI configuration?

benoit74 commented 5 months ago

@kevinmcmurtrie

First, thank you!

Aside the CI issue we will solve, do you need to change more things or should we consider this PR ready?

And I would like to keep a copy of your test videos so that we reuse them next time we need to make a change in presets / encoders. These are precious assets to avoid lengthy discussions.

"Why letters" is a khan academy video, so the license is the Khan one, correct?

Do you know where the two others come from? Do you know what is the license associated with the "Earth quake and social media" video? (the last video has an explicit CC license, but I would prefer to know where it comes from so that we do not discover too late that the video is copyrighted but someone added a CC watermark for fun)

rgaudin commented 5 months ago

@rgaudin any idea why codecov upload is failing in qa job and why 3.6 and 3.7 jobs are still expected while they have been removed from the CI configuration?

3.6 and 3.6 are preventing codecov from running so that explains it. Why do we still have 3.6 and 3.7? I don't know. I've seen that in the past ; Seems like a Github issue ; as you can see the checks time does not mention them… but since the overall status is pending, codecov is not running.

benoit74 commented 5 months ago

Ok for 3.6 and 3.7 About codecov, I was speaking about this error: https://github.com/openzim/python-scraperlib/actions/runs/7781401886/job/21215769840?pr=122#step:7:155

rgaudin commented 5 months ago

Ah… I don't know. Maybe it's because this is an external PR, maybe we're using an old version of codecov…

I thought maybe the CODECOV_TOKEN had changed so I updated the secret and rerun the action but it didn't help.

I see that the CODECOV_TOKEN env looks empty in the command. I believe github displays stars instead for secrets…

Screenshot 2024-02-05 at 09 24 32
kevinmcmurtrie commented 5 months ago

"Why letters" is a khan academy video, from the given example. "Video 11: An introduction" is MIT Open Learning. "Earthquakes and Social Media" is an NBC news broadcast I recorded for Amy (one of the speakers). Certain uses might be "fair use" but don't distribute it.

kevinmcmurtrie commented 5 months ago

I made it a Draft because I feel like I don't have enough time at the moment to be a good contributor. My goal was just to come up with some better codec parameters. You can copy them over to a new PR or use this.

benoit74 commented 5 months ago

I made it a Draft because I feel like I don't have enough time at the moment to be a good contributor. My goal was just to come up with some better codec parameters. You can copy them over to a new PR or use this.

Thank you again for this and the tests you've made on a bunch of videos. It is a very good contribution, even if you consider it is not yet completed, because it was made with quality in mind and because it created a momentum and we feel like we are close to being able to release this.

Since I'm working today on Kolibri to fix other scraper issues you've reported, I will test these new codec parameters. They might not be "optimal" but we have to admit they are already way better than the ones we are using currently, so it makes sense to release them to make a move in the good direction (once I've confirmed they work with more videos).

kevinmcmurtrie commented 5 months ago

Check the compressed size. The old videos had a lot of bitrate padding.

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (a7fe235) 100.00% compared to head (a7307d1) 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #122 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 32 32 Lines 1337 1345 +8 Branches 227 229 +2 ========================================= + Hits 1337 1345 +8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

benoit74 commented 4 months ago

I took the liberty to update first comment to reflect the final content of this PR.

I've reencoded and uploaded all test videos to https://tmp.kiwix.org/ci/test-videos/

I've also added a utility to encode video (debug), fixed the ffmpeg threads usage.

I feel like there is mostly no change in visual quality between v1 and v2, but as already said computation takes way less time and resulting file is way smaller (especially for still videos like khan academy ones).

@rgaudin feedback is welcomed on: