hartator / wayback-machine-downloader

Download an entire website from the Wayback Machine.
Other
5.29k stars 699 forks source link

Error when attempting to download anything #190

Open Fahrenheit opened 3 years ago

Fahrenheit commented 3 years ago

The script was working fine for months. I get this output exactly when it tries to download anything from a website I need to archive part of. I have no idea of how to fix it. I've tried reinstalling macoS, reinstalling ruby, updating ruby, and removing wayback-machine-downloader and reinstalling it. No luck so any help would be extremely appreciated. I need this tool working for a project.

This is the command I used wayback_machine_downloader -sa 'igui.ru' --only "/\.(pxl|deb|ipa|rar|zip|7z|dmg|exe)$/i" and it works fine until I start attempting to download the file types specified in the regex. This exact command worked perfectly for weeks on other sites.


#<Thread:0x00007fa69c9ea7d8@/Library/Ruby/Gems/2.6.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:209 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
    1: from /Library/Ruby/Gems/2.6.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:212:in `block (2 levels) in download_files'
/Library/Ruby/Gems/2.6.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:251:in `download_file': undefined method `split' for nil:NilClass (NoMethodError)
Traceback (most recent call last):
    1: from /Library/Ruby/Gems/2.6.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:212:in `block (2 levels) in download_files'
/Library/Ruby/Gems/2.6.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:251:in `download_file': undefined method `split' for nil:NilClass `(NoMethodError)```
pabs3 commented 3 years ago

The workaround for this appears to be to change if file_id.nil? to if file_id_and_timestamp.nil? in the get_file_list_all_timestamps function in lib/wayback_machine_downloader.rb. I'll submit a pull request for this.

A more correct solution would be to use bytes instead of UTF-8 for filenames, since at least on Linux, filenames are bytes not UTF-8.

On Windows you might need to detect the file name encoding and then convert file names to UTF-16 instead. The CharlockHolmes and rchardet projects can be used to detect the encoding and calling .encode("UTF-16", encoding) can convert from one encoding to another.

pabs3 commented 3 years ago

@Fahrenheit This issue isn't fixed yet so it shouldn't have been closed.

Fahrenheit commented 3 years ago

Sorry, I assumed you were going to fix it afterwards. My bad!

pabs3 commented 3 years ago

I made a workaround, but that hasn't yet been merged into this repo.

-- bye, pabs

https://bonedaddy.net/pabs3/