openzim / ted

Provide the best of TED.com for offline usage!
https://download.kiwix.org/zim/ted/
GNU General Public License v3.0
13 stars 8 forks source link

Handle situation where we fail to download a video #167

Closed benoit74 closed 3 months ago

benoit74 commented 4 months ago

Recipe: https://farm.openzim.org/recipes/ted_topic_animation

Dev preview: https://dev.library.kiwix.org/viewer#ted_mul_animation_2024-02/can-alligators-survive-this-apex-predator

Task log:

[ted2zim::2024-02-28 20:10:05,426] DEBUG:Downloading Can alligators survive this apex predator?
--2024-02-28 20:10:05--  https://py.tedcdn.com/consus/projects/00/66/20/002/products/2023e-coogan-kenny-everglades-002-fallback-f40fc479-e8ed-4e7a-acd8-c22622102492-1200k.mp4
Resolving py.tedcdn.com (py.tedcdn.com)... 151.101.2.133, 151.101.66.133, 151.101.130.133, ...
Connecting to py.tedcdn.com (py.tedcdn.com)|151.101.2.133|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2024-02-28 20:10:05 ERROR 403: Forbidden.

[ted2zim::2024-02-28 20:10:05,714] ERROR:Could not download /output/tmpcn9f04nf/videos/116454/video.mp4
[ted2zim::2024-02-28 20:10:05,808] DEBUG:Speaker doesn't have an image
[ted2zim::2024-02-28 20:10:06,114] INFO:downloaded /output/tmpcn9f04nf/videos/116454/thumbnail.webp from cache at thumbnail/116454
[ted2zim::2024-02-28 20:10:06,114] DEBUG:Converting video 116454
[ted2zim::2024-02-28 20:10:06,198] ERROR:Failed to post process video 116454
[ted2zim::2024-02-28 20:10:06,198] DEBUG:Command '['ffmpeg', '-y', '-i', 'file:/output/tmpcn9f04nf/videos/116454/video.mp4', '-max_muxing_queue_size', '9999', '-codec:v', 'libvpx', '-quality', 'best', '-b:v', '300k', '-maxrate', '300k', '-minrate', '300k', '-qmin', '30', '-qmax', '42', '-vf', "scale='480:trunc(ow/a/2)*2'", '-codec:a', 'libvorbis', '-ar', '44100', '-b:a', '48k', 'file:/tmp/tmp6wdmu1a4/video.tmp.webm']' returned non-zero exit status 1.

The 403 error is real. We should:

Fixing this issue will solve https://github.com/openzim/zim-requests/issues/874 which needs to be updated when this is done

benoit74 commented 4 months ago

Nota: to be investigated a bit further, while the 403 error is real, it looks like the video is playing from TED and not Youtube at https://www.ted.com/talks/kenny_coogan_can_alligators_survive_this_apex_predator (we do not have the Youtube player). Maybe the problem is just that the URL is wrong? But it does not look like there is another one available in the payload... weird ...

The impact of this bug is not negligible, in TED animation topic, there is 74 videos (out of 970) which failed to be downloaded. Other topics are impacted as well.

benoit74 commented 3 months ago

Will be solved together with #164, we will first try to download the link (if provided). Then we will try the Youtube ID (if provided). If both fails or are not provided, we will not include the video in the ZIM (instead of adding a broken video like today).