Open jdimpson opened 6 months ago
See ShiftaDeband's fork (which contains the fixes mentioned in his PR) as well as issues #273 and #275.
See ShiftaDeband's fork (which contains the fixes mentioned in his PR) as well as issues #273 and #275.
Sorry to bother, i'm pretty new in this, how can i actually use this fork instead of the master branch?
@Elmagenta: You'll need to have Ruby installed then you can just download ShiftaDeband's fork as a ZIP file, unzip it, and run wayback_machine_downloader
which you'll find in the bin
subdirectory.
@tinyapps I'm also pretty new in this, and I couldn't follow your instructions. I have Ruby installed, and I had also installed the "original" wayback_machine_downloader via Mac OS Terminal. Now, following your instructions, I downloaded the ZIP file and simply tried to run the binary file. But I get an error message
/Users/flag/Downloads/wayback-machine-downloader-feature-httpGet/bin/wayback_machine_downloader:3:in `require_relative': cannot load such file -- /Users/flag/Downloads/wayback-machine-downloader-feature-httpGet/lib/wayback_machine_downloader (LoadError)
from /Users/flag/Downloads/wayback-machine-downloader-feature-httpGet/bin/wayback_machine_downloader:3:in "
Could you give more details on how to proceed?
@flag-br: Sounds like you might've deleted (or not extracted) the included lib
directory or its contents. After unzipping wayback-machine-downloader-feature-httpGet.zip
, just cd
into the bin
subdirectory and run wayback_machine_downloader
without deleting any of the other included files or folders. The directory structure should look like this:
.
├── Dockerfile
├── Gemfile
├── MIT-LICENSE.txt
├── README.md
├── Rakefile
├── bin
│ └── wayback_machine_downloader
├── lib
│ ├── wayback_machine_downloader
│ │ ├── archive_api.rb
│ │ ├── tidy_bytes.rb
│ │ └── to_regex.rb
│ └── wayback_machine_downloader.rb
├── test
│ └── test_wayback_machine_downloader.rb
└── wayback_machine_downloader.gemspec
@tinyapps Thank you very much, it worked! It ran normally, but the final product is practically the same as what I was getting before with the master branch version. The folder structure apparently reproduced correctly on my machine, but only 15 htm files were downloaded. To check, I ran wayback_machine_downloader with the --list option, and the answer is that there are 1116 htm files.
The command I'm using is (after cd to bin folder): wayback_machine_downloader https://jazzdiscogcorner.pagesperso-orange.fr/
This site is quite simple, just text and practically no images.
Am I doing something wrong?
@flag-br: Glad to hear it worked out. As for issues with a specific site, I'd recommend checking out the documentation and searching through the open and closed issues before posting a new issue.
@flag-br: Sounds like you might've deleted (or not extracted) the included
lib
directory or its contents. After unzippingwayback-machine-downloader-feature-httpGet.zip
, justcd
into thebin
subdirectory and runwayback_machine_downloader
without deleting any of the other included files or folders. The directory structure should look like this:. ├── Dockerfile ├── Gemfile ├── MIT-LICENSE.txt ├── README.md ├── Rakefile ├── bin │ └── wayback_machine_downloader ├── lib │ ├── wayback_machine_downloader │ │ ├── archive_api.rb │ │ ├── tidy_bytes.rb │ │ └── to_regex.rb │ └── wayback_machine_downloader.rb ├── test │ └── test_wayback_machine_downloader.rb └── wayback_machine_downloader.gemspec
I'm being stupid here, but trying to run wayback_machine_downloader (type - file) in the bin directory gave me "not recognized as an internal or external command, operable program or batch file". Fresh Ruby install.
I had to gem build wayback_machine_downloader.gemspec
, then gem install wayback_machine_downloader-2.3.2.gem
that was generated, and finally I could run wayback_machine_downloader
from cmd in a working fashion. Any advice on what I was doing wrong?
It would be great to have rate limiting added to this software. Without it archive.org is (rightfully) returning "Connection refused" errors.
P.S. It is good that there is a fork with fixes. Just wishing that the main repo of this software had those fixes too.
This patched version worked beautifully ...
For those who are in Windows and do not understand much how to do it:
gem install wayback_machine_downloader
Replace bin and lib folders in: C:\Ruby33-x64\lib\ruby\gems\3.3.0\gems\wayback_machine_downloader-2.3.1 for those in the compressed file. https://github.com/ShiftaDeband/wayback-machine-downloader/archive/refs/heads/feature/httpGet.zip
Doesnt seem to work anymore... gives Net::ReadTimeout with #<TCPSocket:(closed)> (Net::ReadTimeout)
The Wayback Machine is (rightfully) blocking bulk downloads that exceed too much bandwidth or requests per secon. As far as I can tell, this product does no rate-limiting of itself, at least not by default, per any examples in the README. As a result, the Internet Archive will soft ban your IP address if you use this script on a web site of any significant size.
It's irresponsible to leave this repository up without at least a warning in the documentation.