Closed kevinmcmurtrie closed 4 months ago
Thank you for this contribution!
Also I find it odd to not only fix the problematic flags but to change the target bitrate without mentioning it 🤓
Also I find it odd to not only fix the problematic flags but to change the target bitrate without mentioning it 🤓
The original bitrate isn't real because the numbers were impossible. It was actually < 200 kbps padded up to 300kbps. That goes into a compressed ZIM and the padding shrinks. My new value of 200 kbps might actually be higher. https://github.com/openzim/kolibri/issues/75#issuecomment-1841381604
It needs some testing. I meant for this to be a draft PR but clicked it wrong.
https://pixelmemory.us/~mcmurtri/Earthquakes%20and%20Social%20Media.mp4 https://pixelmemory.us/~mcmurtri/Earthquakes%20and%20Social%20Media(old).webm https://pixelmemory.us/~mcmurtri/Earthquakes%20and%20Social%20Media(new).webm
https://pixelmemory.us/~mcmurtri/Video%2011_%20An%20Introduction.mp4 https://pixelmemory.us/~mcmurtri/Video%2011_%20An%20Introduction(old).webm https://pixelmemory.us/~mcmurtri/Video%2011_%20An%20Introduction(new).webm
This looks very good @kevinmcmurtrie ; thank you.
I don't have an eye for this kind of things but to me it looks like no loss in quality for a big gain in size. I love it. I can not wait to have our main video based scrapers released with this new configuration. @kevinmcmurtrie Thank you very much.
@kevinmcmurtrie let me check this QA issue, never mind
@rgaudin any idea why codecov upload is failing in qa
job and why 3.6 and 3.7 jobs are still expected while they have been removed from the CI configuration?
@kevinmcmurtrie
First, thank you!
Aside the CI issue we will solve, do you need to change more things or should we consider this PR ready?
And I would like to keep a copy of your test videos so that we reuse them next time we need to make a change in presets / encoders. These are precious assets to avoid lengthy discussions.
"Why letters" is a khan academy video, so the license is the Khan one, correct?
Do you know where the two others come from? Do you know what is the license associated with the "Earth quake and social media" video? (the last video has an explicit CC license, but I would prefer to know where it comes from so that we do not discover too late that the video is copyrighted but someone added a CC watermark for fun)
@rgaudin any idea why codecov upload is failing in
qa
job and why 3.6 and 3.7 jobs are still expected while they have been removed from the CI configuration?
3.6 and 3.6 are preventing codecov from running so that explains it. Why do we still have 3.6 and 3.7? I don't know. I've seen that in the past ; Seems like a Github issue ; as you can see the checks time does not mention them… but since the overall status is pending, codecov is not running.
Ok for 3.6 and 3.7 About codecov, I was speaking about this error: https://github.com/openzim/python-scraperlib/actions/runs/7781401886/job/21215769840?pr=122#step:7:155
Ah… I don't know. Maybe it's because this is an external PR, maybe we're using an old version of codecov…
I thought maybe the CODECOV_TOKEN
had changed so I updated the secret and rerun the action but it didn't help.
I see that the CODECOV_TOKEN
env looks empty in the command. I believe github displays stars instead for secrets…
"Why letters" is a khan academy video, from the given example. "Video 11: An introduction" is MIT Open Learning. "Earthquakes and Social Media" is an NBC news broadcast I recorded for Amy (one of the speakers). Certain uses might be "fair use" but don't distribute it.
I made it a Draft because I feel like I don't have enough time at the moment to be a good contributor. My goal was just to come up with some better codec parameters. You can copy them over to a new PR or use this.
I made it a Draft because I feel like I don't have enough time at the moment to be a good contributor. My goal was just to come up with some better codec parameters. You can copy them over to a new PR or use this.
Thank you again for this and the tests you've made on a bunch of videos. It is a very good contribution, even if you consider it is not yet completed, because it was made with quality in mind and because it created a momentum and we feel like we are close to being able to release this.
Since I'm working today on Kolibri to fix other scraper issues you've reported, I will test these new codec parameters. They might not be "optimal" but we have to admit they are already way better than the ones we are using currently, so it makes sense to release them to make a move in the good direction (once I've confirmed they work with more videos).
Check the compressed size. The old videos had a lot of bitrate padding.
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
a7fe235
) 100.00% compared to head (a7307d1
) 100.00%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I took the liberty to update first comment to reflect the final content of this PR.
I've reencoded and uploaded all test videos to https://tmp.kiwix.org/ci/test-videos/
I've also added a utility to encode video (debug), fixed the ffmpeg threads usage.
I feel like there is mostly no change in visual quality between v1 and v2, but as already said computation takes way less time and resulting file is way smaller (especially for still videos like khan academy ones).
@rgaudin feedback is welcomed on:
Edits by @benoit74
Enabler for https://github.com/openzim/kolibri/issues/75 Fix #123
Changes
-b:v
is reduced to 128k instead of 300k (deemed sufficient)-maxrate
and-minrate
have been removed (otherwise the quantizer has no room for adaptation)-qmin
is reduced to 18 instead of 30 (significantly reduce the bitrate on very still videos)-qmax
is decreased to 40 instead of 42reencode
allows to override this default value)Original comment
https://github.com/openzim/kolibri/issues/75
The VP8 values appear to have been copied from the x264 options but these codecs do not share the same scale.
Removed contradictory options. Use ordinary average bitrate target with min/max bounds on quality.