openzim / openedx

Open edX (to zim) scraper
GNU General Public License v3.0
8 stars 7 forks source link

Refactored file downloading #38

Closed satyamtg closed 4 years ago

satyamtg commented 4 years ago

This introduces the following changes -

This fixes #36

Will be opened for review once scraperlib is updated Depends on https://github.com/openzim/python_scraperlib/pull/28 This now uses HEAD requests

rgaudin commented 4 years ago

Quick comment: magic is a slow process. We usually want to use it when we don't have another way to get it or when the chances of an erroneous info is high and the consequences important.

satyamtg commented 4 years ago

Quick comment: magic is a slow process. We usually want to use it when we don't have another way to get it or when the chances of an erroneous info is high and the consequences important.

Okay. I know magic is slow but did that because using save_large_file from zimscraperlib.download means we do not get headers and doing another request would mean a bit longer wait time. Maybe we should refactor save_large_file() in zimscraperlib to also return headers if required. I have made this PR to solve it - https://github.com/openzim/python_scraperlib/pull/28. Will revert to original way of filetype checking for now

satyamtg commented 4 years ago

So, this uses save_large_file() from zimscraperlib.download. Also, I've made the following changes -