Closed satyamtg closed 4 years ago
Merging #28 into master will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #28 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 19 19
Lines 743 749 +6
=========================================
+ Hits 743 749 +6
Impacted Files | Coverage Δ | |
---|---|---|
src/zimscraperlib/download.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 9c2eaab...20c76e3. Read the comment docs.
Hum, I don't like it much.
Content-Type
there.download_file
unconditionally as it has zero cost (data's already in the request object) and I don't like params that changes the return type. Sometimes it's required but should be avoided as much as possible.req
to res
here. You must have noticed we use this req
everywhere in the scrapers. I agree it's not ideal but what's res
? I already see multiple candidates for what this could mean. As this holds a Response
object I propose we use resp
which seems clear to me. What do you think? In that case we shall update the test as well.Btw wget may be removed in the future. It was introduced when moving initial stuff to this lib but may not be the best candidate for this task. We should think about the best way to have a robust, efficient, manageable and lightweight downloader in the scraperlib.
- What's the use case? Most of the time, you'll make a a HEAD request prior to downloading stuff, so you can check whether you need to download the file at all. So you'll get the
Content-Type
there.
I agree. There's not much use cases for this except special ones like openedx (as discussed over slack)
- We could return the headers from
download_file
unconditionally as it has zero cost (data's already in the request object) and I don't like params that changes the return type. Sometimes it's required but should be avoided as much as possible.
Did that
- don't change
req
tores
here. You must have noticed we use thisreq
everywhere in the scrapers. I agree it's not ideal but what'sres
? I already see multiple candidates for what this could mean. As this holds aResponse
object I propose we useresp
which seems clear to me. What do you think? In that case we shall update the test as well.
resp
sounds better
- we definitely don't want to parse wget's output. We should encourage users to do a HEAD before running a large download.
Yup. That may not be required for large files, so removed that.
This fixes #27 by adding
with_headers
option to return the response headers instead ofNone
insave_file()
andsave_large_file()
Contains the following changes -
We return headers by default in save_file onlywith_headers
option to return the response headers instead ofNone
insave_file()
andsave_large_file()
resresp in save_file() as that's essentially a response instead of a requestassert_headers()
)These changes are backwards compatible but add new feature to the current API