Closed kevinmcmurtrie closed 7 months ago
Notes: I can't reproduce this slowness outside of the docker container.
1) There may be conflicting FFmpeg options that are causing excessive backtracking and retries. This sets both min and max bitrates and min and max quality rates. It's possible that this causes some backtracking.
'ffmpeg', '-y', '-i', 'file:/output/tmpmf3sut4d/afe1380b936363857ce244eb5eda4019.mp4', '-max_muxing_queue_size', '9999', '-codec:v', 'libvpx', '-quality', 'best', '-b:v', '300k', '-maxrate', '300k', '-minrate', '300k', '-qmin', '30', '-qmax', '42', '-vf', "scale='480:trunc(ow/a/2)*2'", '-codec:a', 'libvorbis', '-ar', '44100', '-b:a', '48k', 'file:/tmp/tmph5ib_ryv/video.tmp.webm'
Something simpler may help. This targets a bitrate of 300kb within a 512kb window and gives a wider quality range:
'ffmpeg', '-y', '-i', 'file:/output/tmpmf3sut4d/afe1380b936363857ce244eb5eda4019.mp4', '-max_muxing_queue_size', '9999', '-codec:v', 'libvpx', '-quality', 'best', '-b:v', '300k', '-bufsize', '512k', '-qmin', '20', '-vf', "scale='480:trunc(ow/a/2)*2'", '-codec:a', 'libvorbis', '-ar', '44100', '-b:a', '48k', 'file:/tmp/tmph5ib_ryv/video.tmp.webm'
At least for me, the second one is generates faster, looks better, and consumes about 1/3 the bandwidth. There's no minimum bitrate and no maximum q so it can fly past all of those motionless whiteboard images.
2) The FFmpeg encoder may be old?
Thank you for reporting this and doing some tests. I will have a look into it.
Regarding versions, image ghcr.io/openzim/kolibri:1.1.0
is using:
ffmpeg version 5.1.3-1 Copyright (c) 2000-2022 the FFmpeg developers
libvpx
encoder, libvpx7
apt package version 1.11.0-2ubuntu2.2
. libvpx 1.11.0
has been originally released (binary form) on Oct 7, 2021. As of today, it is the latest version on Ubuntu LTS. Probably not a big dealRegarding ffmpeg settings, @rgaudin @kelson42 do we have any past issue which discuss why these settings have been chosen? I imagine finding one preset to more or less rule them all is not an easy feat.
Encoding logic is coming from python-scraperlib and presets (we use the low quality webm version for Khan Academy recipe)
Webm low quality presets are coming from https://github.com/openzim/python-scraperlib/issues/14 (https://github.com/openzim/python-scraperlib/commit/78e2bb0e562b58f240436efb7b8700fa15deaa39#diff-2cc68edde814805fe24114313acdde91ae832adef02f7d0576675d74db3f7b58 more precisely) but I did not find any discussion there, so they probably have been ported from ted/youtube scrapers, but I failed to find any discussion over there.
I tested suggested settings on https://studio.learningequality.org/content/storage/b/7/b71ca7f102ae16e4023c9f49b015d6b7.mp4
I do not find a significant visual difference in the resulting file (but this is obviously very personal).
I confirm that processing is a little faster (from 10secs to 8secs) and file is more than 3 times smaller (from 2.7MB to 768KB, while original mp4 is 690KB).
I do not find any difference (in terms of processing time) between in Docker and on the host directly (same machine), so there is probably something strange/unusual in your Docker setup on your machine.
we chose to go with webm/vp9
vp9 or vp8? looking at the setting I believe we use vp8
Sorry it's a slip, vp8 of course
Progressing towards a merged PR on this will obviously needs significant testing with many kind of videos and we (@Kiwix) probably won't have sufficient bandwidth for this in the coming months.
Contributions are of course more than welcomed.
Note however that this effort might conflict with another initiative we might consider to start around choosing a different video codec (and JS libs to fallback when reader/browser does not support this codec). The test set (and testing procedure) will nevertheless be very useful and most probably reused.
I was going to kill this task on pixelmemory because it has built up over 161 GB of files...but not really. The filesystem compression ratio is over 3:1 so it's only 52GB on disk. That should not be happening for video files.
At this stage it looks like we might move from webm/vp8 to mpg4/h264. If we go that direction, we should reassess our ffmpeg command line (in particular for low quality).
I was going to kill this task on pixelmemory because it has built up over 161 GB of files...but not really. The filesystem compression ratio is over 3:1 so it's only 52GB on disk. That should not be happening for video files.
I can only agree. And it match the 3:1 ratio we both observed when changing the ffmpeg settings. I compressed (with default Zip settings on Mac) the "big" video I previously obtained with current scraper ffmpeg settings and I confirm it compress very well (again a 3:1 ratio, going from 2.7M to 868MB) which shouldn't be possible for a video file.
Edit: 868KB, not 868MB
MPEG4 to WebM transcoding is extremely slow. https://farm.openzim.org/pipeline/c06a8148-7d9a-422c-b5b4-abfe93d51168 has been crawling along for two weeks while using 100% of all CPUs.