hartator / wayback-machine-downloader

Download an entire website from the Wayback Machine.
Other
5.16k stars 677 forks source link

`open_http': 403 Forbidden (OpenURI::HTTPError) #101

Open sscvbnm opened 6 years ago

sscvbnm commented 6 years ago

Hi.. I want to download this site: https://web.archive.org/web/20071213053236/http://www.qquran.com:80/qu.php?goto=main

I get this error every time I run the request also I changed the parameters and main url typing with the same problem:

`wayback_machine_downloader http://www.qquran.com:80/qu.php?goto=main -f 20071213053236 Downloading http://www.qquran.com:80/qu.php?goto=main to websites/www.qquran.com:80/ from Wayback Machine archives.

Getting snapshot pagesC:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:363:in open_http': 403 Forbidden (OpenURI::HTTPError) from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:741:inbuffer_open' from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:212:in block in open_loop' from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:210:incatch' from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:210:in open_loop' from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:151:inopen_uri' from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:721:in open' from C:/Ruby24-x64/lib/ruby/2.4.0/open-uri.rb:35:inopen' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader/archive_api.rb:8:in get_raw_list_from_api' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:87:inget_all_snapshots_to_consider' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:104:in get_file_list_curated' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:131:inget_file_list_by_timestamp' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:270:in file_list_by_timestamp' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:154:indownload_files' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/bin/wayback_machine_downloader:68:in <top (required)>' from C:/Ruby24-x64/bin/wayback_machine_downloader:22:inload' from C:/Ruby24-x64/bin/wayback_machine_downloader:22:in <main>'

I tried downloading two other sites with NO problem.

l1n commented 6 years ago

Experiencing a similar error.

Downloading http://www.darkpersonalities.com to websites/www.darkpersonalities.com/ from Wayback Machine archives.
Getting snapshot pages/usr/lib/ruby/2.4.0/open-uri.rb:363:in `open_http': 403 Forbidden (OpenURI::HTTPError)
        from /usr/lib/ruby/2.4.0/open-uri.rb:741:in `buffer_open'
        from /usr/lib/ruby/2.4.0/open-uri.rb:212:in `block in open_loop'
        from /usr/lib/ruby/2.4.0/open-uri.rb:210:in `catch'
        from /usr/lib/ruby/2.4.0/open-uri.rb:210:in `open_loop'
        from /usr/lib/ruby/2.4.0/open-uri.rb:151:in `open_uri'
        from /usr/lib/ruby/2.4.0/open-uri.rb:721:in `open'
        from /usr/lib/ruby/2.4.0/open-uri.rb:35:in `open'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:87:in `get_all_snapshots_to_consider'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:104:in `get_file_list_curated'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:131:in `get_file_list_by_timestamp'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:270:in `file_list_by_timestamp'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:154:in `download_files'
        from /home/ndevereaux/.gem/ruby/2.4.0/gems/wayback_machine_downloader-2.1.1/bin/wayback_machine_downloader:68:in `<top (required)>'
        from /home/ndevereaux/.gem/ruby/2.4.0/bin/wayback_machine_downloader:23:in `load'
        from /home/ndevereaux/.gem/ruby/2.4.0/bin/wayback_machine_downloader:23:in `<main>'
develroo commented 6 years ago

Ditto.

Getting snapshot pages/usr/lib/ruby/2.3.0/open-uri.rb:359:in `open_http': 403 Forbidden (OpenURI::HTTPError)
    from /usr/lib/ruby/2.3.0/open-uri.rb:737:in `buffer_open'
    from /usr/lib/ruby/2.3.0/open-uri.rb:212:in `block in open_loop'
    from /usr/lib/ruby/2.3.0/open-uri.rb:210:in `catch'
    from /usr/lib/ruby/2.3.0/open-uri.rb:210:in `open_loop'
    from /usr/lib/ruby/2.3.0/open-uri.rb:151:in `open_uri'
    from /usr/lib/ruby/2.3.0/open-uri.rb:717:in `open'
    from /usr/lib/ruby/2.3.0/open-uri.rb:35:in `open'
    from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
    from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:87:in `get_all_snapshots_to_consider'
    from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:104:in `get_file_list_curated'
    from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:131:in `get_file_list_by_timestamp'
    from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:141:in `list_files'
    from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.1.1/bin/wayback_machine_downloader:66:in `<top (required)>'
    from /usr/local/bin/wayback_machine_downloader:22:in `load'
    from /usr/local/bin/wayback_machine_downloader:22:in `<main>'
adibpg commented 6 years ago

I'm experiencing the same issue for one site, I've successfully downloaded others with no issue. Getting snapshot pagesC:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:358:in open_http': 403 Forbidden (OpenURI::HTTPError) from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:736:inbuffer_open' from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:211:in block in open_loop' from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:209:incatch' from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:209:in open_loop' from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:150:inopen_uri' from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:716:in open' from C:/Ruby22-x64/lib/ruby/2.2.0/open-uri.rb:34:inopen' from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader/archive_api.rb:8:in get_raw_list_from_api' from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:87:inget_all_snapshots_to_consider' from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:104:in get_file_list_curated' from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:131:inget_file_list_by_timestamp' from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:141:in list_files' from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wayback_machine_downloader-2.1.1/bin/wayback_machine_downloader:66:in<top (required)>' from C:/Ruby22-x64/bin/wayback_machine_downloader:23:in load' from C:/Ruby22-x64/bin/wayback_machine_downloader:23:in

'

armoha commented 6 years ago

I got same error for particular sites.

Getting snapshot pagesC:/Ruby24/lib/ruby/2.4.0/open-uri.rb:363:in `open_http': 403 Forbidden (OpenURI::HTTPError)
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:741:in `buffer_open'
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:212:in `block in open_loop'
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:210:in `catch'
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:210:in `open_loop'
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:151:in `open_uri'
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:721:in `open'
    from C:/Ruby24/lib/ruby/2.4.0/open-uri.rb:35:in `open'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:87:in `get_all_snapshots_to_consider'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:104:in `get_file_list_curated'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:131:in `get_file_list_by_timestamp'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:270:in `file_list_by_timestamp'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/lib/wayback_machine_downloader.rb:154:in `download_files'
    from C:/Ruby24/lib/ruby/gems/2.4.0/gems/wayback_machine_downloader-2.1.1/bin/wayback_machine_downloader:68:in `<top (required)>'
    from C:/Ruby24/bin/wayback_machine_downloader:23:in `load'
    from C:/Ruby24/bin/wayback_machine_downloader:23:in `<main>'
kshahkshah commented 6 years ago

@hartator is this a Ruby version issue? I tried with 2.3.1 (YARV/MRI) and it did not work, same error. I'll try w/an older version of Ruby as well today.

kshahkshah commented 6 years ago

Okay, this has nothing to do with the gem or Ruby versions, and has everything to do with the web archive itself. This thread appears relevant:

https://archive.org/post/406632/why-does-the-wayback-machine-pay-attention-to-robotstxt

sscvbnm commented 6 years ago

Ok, so what we can do now, how to solve this issue??

Cristov9000 commented 6 years ago

I am getting "Getting snapshot pages/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/open-uri.rb:353:in `open_http': 403 Forbidden (OpenURI::HTTPError)" as well for some sites but not all. Anyone find a work around?