mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
89 stars 14 forks source link

Some images get dropped from the dump #170

Closed robkam closed 8 months ago

robkam commented 1 year ago

Describe the Bug

Some images get dropped from the dump with "size is not match ... skipping" message.

Command for Reproducing the Bug

dumpgenerator --xml --xmlrevisions --images --api https://elec-recyc.fandom.com/api.php

Output

stdout ```bash (snipped) Retrieving images... Creating "./elec_recyc.fandom.com-20231114-wikidump/images" directory File './elec_recyc.fandom.com-20231114-wikidump/images/Example.jpg' size is not match '5242', skipping File './elec_recyc.fandom.com-20231114-wikidump/images/Uln2003.jpg' size is not match '26259', skipping (snipped) ```
errors.log ```bash 2023-11-13 21:34:01: File './elec_recyc.fandom.com-20231113-wikidump/images/Example.jpg' size is not match '5242', skipping 2023-11-13 21:34:05: File './elec_recyc.fandom.com-20231113-wikidump/images/Uln2003.jpg' size is not match '26259', skipping ```

Platform Details

Desktop

Additional Context