Open Panzerfuhrer opened 5 years ago
There was a modification made on another fork that may correct the problem you saw. It involved escaping the URI. I'll try it out and report back.
Great, thanks. Can you maybe link the fork?
This issue still persists 2½ years later (tested on version 2.3.1). Trying the following simple command still yields the same result:
wayback_machine_downloader "http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip"
Output:
Downloading http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip to websites/ftp.blizzard.com:80/ from Wayback Machine archives.
Getting snapshot pages. found 1 snaphots to consider.
1 files to download: http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip # bad URI(is not URI?): "https://web.archive.org/web/20061206145254id_/http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip" websites/ftp.blizzard.com%3a80/pub/war3/maps/spotlight/Pumpkinhunt by Laundry [P].zip was empty and was removed. http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip -> websites/ftp.blizzard.com%3a80/pub/war3/maps/spotlight/Pumpkinhunt by Laundry [P].zip (1/1)
Download completed in 1.96s, saved in websites/ftp.blizzard.com:80/ (1 files)
The folder, however, remains empty.
The same issue also appears at least with URLs containing "^".
Related to wayback_machine_downloader -v 2.3.1
I think that the squarebrackets []
are somehow considered as null by the waybackmachine downloader. I did a quick google search and there are other scenarios, saw one related to WebDAV, that also showed struggles with listing files and directories with square brackets.
This should be considered a bug.
As you can see below, we see the text was empty and was removed.
$ wayback_machine_downloader -d zzzz http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip
Downloading http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip to zzzz/ from Wayback Machine archives.
Getting snapshot pages. found 1 snaphots to consider.
1 files to download:
http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip # bad URI(is not URI?): "https://web.archive.org/web/20061206145254id_/http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip"
zzzz/pub/war3/maps/spotlight/Pumpkinhunt by Laundry [P].zip was empty and was removed.
http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip -> zzzz/pub/war3/maps/spotlight/Pumpkinhunt by Laundry [P].zip (1/1)
It should be downloaded because I can download the file manually as evidenced below:
I tried this, thinking a regex for just zip files might help as a workaround but not.
wayback_machine_downloader -l -c 10 --only "/\.(zip)$/i" -d zzzz http://ftp.blizzard.com:80/pub/war3/maps/spotlight/ > log.txt
Attached is the log file: log.txt
I saw more reports of issues of not all files being downloaded. Here is mine.
I was trying to download ftp.blizzard.com manually using wget on the separate folders until I found this fantastic program. First, I ran it with default options, but after doing a folder compare of one of my manually downloaded directories I noticed that a few files were missing. So I retried with the -s option, downloading every time stamp, still no luck.
What I found out is that for example in the folder that I tested, the files Pass_The_Bomb_Version_4[1].2.zip and Pumpkinhunt by Laundry [P].zip are missing. These are the only two files with the bracket symbols in their filenames.
This is the output in the terminal: http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip # bad URI(is not URI?): http://web.archive.org/web/20061206145254id_/http://ftp.blizzard.com:80/pub/war3/maps/spotlight/Pumpkinhunt%20by%20Laundry%20[P].zip websites/ftp.blizzard.com/20061206145254/pub/war3/maps/spotlight/Pumpkinhunt by Laundry [P].zip was empty and was removed.
I think something happens because of the brackets. Maybe something can be changed so that the program does not care about which symbols are in a URL?
Hope this post will help you with improving this fantastic program!