mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
89 stars 14 forks source link

Add an option to bypass CDN image compression #163

Closed yzqzss closed 1 year ago

yzqzss commented 1 year ago

Add --bypass-cdn-image-compression option to bypass CDN image compression. (CloudFlare Polish, etc.)

By adding random params to image URLs.


fix: #155 reported by @milahu

yzqzss commented 1 year ago

@milahu try this. :)

randomnetcat commented 1 year ago

It really doesn't seem like a good idea to me to just refuse to download compressed images entirely. If the user doesn't ask for bypassing, it should at most warn.

robkam commented 1 year ago

oops, sorry, in too much of a hurry here.

yzqzss commented 1 year ago

If the user doesn't ask for bypassing, it should at most warn.

Yeap, if dumpgenerator detects the cf-polished HTTP header, it will raise AssertionError to warn the user to use --bypass-cdn-image-compression . (https://github.com/mediawiki-client-tools/mediawiki-scraper/pull/163/commits/cc62f57cb660d541440be1ded03e1138582cac57#diff-8cb0e49bf7f47158ff3db3113db064e57d908a7351079af7606e441f85f874c3R67)

It really doesn't seem like a good idea to me to just refuse to download compressed images entirely.

So, let's finish the TODO, add a --nocheck-image-size option?