hartator / wayback-machine-downloader

Download an entire website from the Wayback Machine.
Other
5.16k stars 677 forks source link

400 Bad Request? #182

Open guitarbugxD opened 3 years ago

guitarbugxD commented 3 years ago

Hi so I am having a bit of trouble with my restore. I believe I have done the download of both the wayback machine downloader and ruby 2.7.2p137 correctly. The problem I am having is with the last step. I am trying to recover a sight recently archived and am confused on why I get this error. Does anyone have any solutions?

Getting snapshot pagesC:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader/archive_api.rb:8: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
Traceback (most recent call last):
        17: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `<main>'
        16: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `load'
        15: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/bin/wayback_machine_downloader:72:in `<top (required)>'
        14: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:192:in `download_files'
        13: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
        12: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:168:in `get_file_list_by_timestamp'
        11: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
        10: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
         9: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
         8: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:19:in `open'
         7: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:50:in `open'
         6: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:744:in `open'
         5: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:174:in `open_uri'
         4: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `open_loop'
         3: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `catch'
         2: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:235:in `block in open_loop'
         1: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:764:in `buffer_open'
C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:387:in `open_http': 400 Bad Request (OpenURI::HTTPError)
qq4265461 commented 3 years ago

me too. ubuntu 20.04 ruby 2.7 change open() to URI.open() error msg:

    16: from /usr/local/bin/wayback_machine_downloader:23:in `<main>'
    15: from /usr/local/bin/wayback_machine_downloader:23:in `load'
    14: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/bin/wayback_machine_downloader:72:in `<top (required)>'
    13: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:192:in `download_files'
    12: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
    11: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:168:in `get_file_list_by_timestamp'
    10: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
     9: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
     8: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
     7: from /usr/lib/ruby/2.7.0/open-uri.rb:50:in `open'
     6: from /usr/lib/ruby/2.7.0/open-uri.rb:744:in `open'
     5: from /usr/lib/ruby/2.7.0/open-uri.rb:174:in `open_uri'
     4: from /usr/lib/ruby/2.7.0/open-uri.rb:233:in `open_loop'
     3: from /usr/lib/ruby/2.7.0/open-uri.rb:233:in `catch'
     2: from /usr/lib/ruby/2.7.0/open-uri.rb:235:in `block in open_loop'
     1: from /usr/lib/ruby/2.7.0/open-uri.rb:764:in `buffer_open'

/usr/lib/ruby/2.7.0/open-uri.rb:387:in `open_http': 400 Bad Request (OpenURI::HTTPError)

pabs3 commented 3 years ago

Please retry with the latest version 2.3.0, it might work better.

treimers commented 2 years ago

Same problem on my Mac and my Windows 10 VM:

OS: macOS HighSierra 10.13.6 wayback_machine_downloader: 2.3.0 Ruby: ruby 2.3.7p456 (2018-03-28 revision 63024) [universal.x86_64-darwin17]

Getting snapshot pages/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:359:in `open_http': 400 Bad Request (OpenURI::HTTPError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:737:in `buffer_open'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:212:in `block in open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:210:in `catch'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:210:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:151:in `open_uri'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:717:in `open'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader/archive_api.rb:13:in `get_raw_list_from_api'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:164:in `get_file_list_by_timestamp'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:192:in `download_files'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/bin/wayback_machine_downloader:72:in `<top (required)>'
from /usr/local/bin/wayback_machine_downloader:22:in `load'
from /usr/local/bin/wayback_machine_downloader:22:in `<main>'

Os: Windows 10 wayback_machine_downloader: 2.3.0 Ruby: wayback_machine_downloader: 2.3.0

Getting snapshot pagesTraceback (most recent call last):
        15: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `<main>'
        14: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `load'
        13: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/bin/wayback_machine_downloader:72:in `<top (required)>'
        12: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:192:in `download_files'
        11: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
        10: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:164:in `get_file_list_by_timestamp'
         9: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
         8: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
         7: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader/archive_api.rb:13:in `get_raw_list_from_api'
         6: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:744:in `open'
         5: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:174:in `open_uri'
         4: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `open_loop'
         3: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `catch'
         2: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:235:in `block in open_loop'
         1: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:764:in `buffer_open'
C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:387:in `open_http': 400 Bad Request (OpenURI::HTTPError)

Any idea what might help?

Update:

I have tried the docker container as well and ran into the same. This might be related to the web site I try to download: http:/pondini.org

pabs3 commented 2 years ago

That site works fine for me on Linux.

I have created a branch that adds some debugging to the API code. It will still crash, but if you could copy the debugging output before the Traceback, then that would help figure out where the issue is and if this is a bug in the wayback-machine-downloader code or in the API itself. Please try it out on your system:

https://github.com/pabs3/wayback-machine-downloader/tree/debug-crashes

treimers commented 2 years ago

Hi,

sorry, but the problem is gone now. I tried the debug version and it worked without problems. So I returned to the original version and it worked as well.

As far as I do understand this behavior something must have been changed at Wayback.

pabs3 commented 2 years ago

Thanks for testing, I think you could be right. I think that the wayback-machine-downloader should do better than crashing when this happens though. I'll try to track down some IA folks to ask about it.

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 commented 2 years ago

Did anyone record any of the IA URLs that gave a 400 error?