learningequality / ricecooker

Python library for creating Kolibri channels and uploading to Studio
https://ricecooker.readthedocs.io/
MIT License
18 stars 53 forks source link

Chefs using WebVideoFile re-download files every time chef runs #191

Closed ivanistheone closed 5 years ago

ivanistheone commented 5 years ago

Description

Chefs using WebVideoFile re-download files every time chef runs.

During a chef run we see..

    Downloading chefdata/transcripts/ar/Premiers secours- nourrisson : Réanimation cardio pulmonaire.pdf
    --- Downloaded 3980a3df44d95dcc07bcf8d5d0adf6b4.pdf
    --- Downloaded e4b7a8b440a5bbee4d26ba5072839f64.jpg

we get to the point where we're creating a WebVideoFile from youtube id S8AVPLg7krg that looks like this


[youtube] S8AVPLg7krg: Downloading webpage
[youtube] S8AVPLg7krg: Downloading video info webpage
[youtube] S8AVPLg7krg: Extracting video information
[youtube] S8AVPLg7krg: Downloading MPD manifest
[youtube] S8AVPLg7krg: Downloading MPD manifest
[download] Destination: /tmp/c948b043899ffc70bec380dfdac46c9f.f135.mp4
[download] Destination: /tmp/c948b043899ffc70bec380dfdac46c9f.mp4.f140
[ffmpeg] Merging formats into "/tmp/c948b043899ffc70bec380dfdac46c9f.mp4"
Deleting original file /tmp/c948b043899ffc70bec380dfdac46c9f.f135.mp4 (pass -k to keep)
Deleting original file /tmp/c948b043899ffc70bec380dfdac46c9f.mp4.f140 (pass -k to keep)

so we're re-downloading the video from youtube every time the chef runs.

e.g. https://github.com/learningequality/sushi-chef-sikana/blob/master/sushichef.py#L205-L209

Required solution

Use some sort of cache logic for downloads of WebVideoFiles -- keyed on url and download settings. This way chef scripts like Sikana won't re-download the convent every time they run.

How does the .ricecookerfilecache work?

ivanistheone commented 5 years ago

Nevermind, ricecooker cache logic is working exactly as expected --- there were just new videos to download. On next run it didn't redownload.