mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
89 stars 14 forks source link

images are not downloaded #155

Closed milahu closed 1 year ago

milahu commented 1 year ago

dumpgenerator --images is not downloading images, it downloads only the *.desc files

it works with the old version https://github.com/WikiTeam/wikiteam/pull/331 which downloads *.desc files and image files

yzqzss commented 1 year ago

Please provide the command you ran (with url) and more info(errors log, stdout, etc.).

elsiehupp commented 1 year ago

Please provide the command you ran (with url) and more info(errors log, stdout, etc.).

I've been meaning to create an Issue template with information that would be helpful for people to provide... (The problem IIRC was I got stuck on creating a shell command that would reliably print the name and version of the shell on any platform.)

robkam commented 1 year ago

Added a Reporting issues section to the README.md #159

milahu commented 1 year ago

version 3: eb1529a4c18ec3d71485aea3351330f6a52cdae7 (mediawiki-scraper.nix)

version 2: https://github.com/WikiTeam/wikiteam/commit/54d9d8051e6159bf6161476c76a9f0665ee7a529 (mediawiki-scraper-2.nix)

same command for both versions:

dumptime=$(date +%F.%H-%M-%S)

path=nixos.wiki.$dumptime

dumpgenerator \
https://nixos.wiki/ \
--api https://nixos.wiki/api.php \
--index https://nixos.wiki/index.php \
--xml \
--images \
--path $path \
2>&1 | tee -a dumpgenerator.$dumptime.log

dumpgenerator.log.v3.fail.gz files.v3.fail.txt.gz

dumpgenerator.log.v2.ok.gz files.v2.ok.txt.gz (the html folder is not by dumpgenerator)