hartator / wayback-machine-downloader

Download an entire website from the Wayback Machine.
Other
5.16k stars 677 forks source link

503 Service Unavailable (OpenURI::HTTPError) #86

Open Ninoninoninonino opened 7 years ago

Ninoninoninonino commented 7 years ago

I'm batch downloading website instances of various years. I often get error codes like /usr/share/ruby/open-uri.rb:353:inopen_http': 503 Service Unavailable (OpenURI::HTTPError)

I'm not sure whether it's because of the Internet Archive, Wayback-Machine-Downloader or my high-performance cluster. For example: /usr/share/ruby/open-uri.rb:353:in open_http': 503 Service Unavailable (OpenURI::HTTPError) from /usr/share/ruby/open-uri.rb:709:inbuffer_open' from /usr/share/ruby/open-uri.rb:210:in block in open_loop' from /usr/share/ruby/open-uri.rb:208:incatch' from /usr/share/ruby/open-uri.rb:208:in open_loop' from /usr/share/ruby/open-uri.rb:149:inopen_uri' from /usr/share/ruby/open-uri.rb:689:in open' from /usr/share/ruby/open-uri.rb:34:inopen' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/lib/wayback_machine_downloader/archive_api.rb:8:in get_raw_list_from_api' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/lib/wayback_machine_downloader.rb:84:inget_all_snapshots_to_consider' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/lib/wayback_machine_downloader.rb:101:in get_file_list_curated' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/lib/wayback_machine_downloader.rb:128:inget_file_list_by_timestamp' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/lib/wayback_machine_downloader.rb:264:in file_list_by_timestamp' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/lib/wayback_machine_downloader.rb:149:indownload_files' from /home/ucqbndj/.gem/ruby/gems/wayback_machine_downloader-1.1.4/bin/wayback_machine_downloader:64:in <top (required)>' from /home/ucqbndj/bin/wayback_machine_downloader:23:inload' from /home/ucqbndj/bin/wayback_machine_downloader:23:in <main>'

Ninoninoninonino commented 7 years ago

This could potentially be remedied by integrating the specification of a user agent like the authors of the Waybackpack have done it

--user-agent USER_AGENT The User-Agent header to send along with your requests to the Wayback Machine. If possible, please include the phrase 'waybackpack' and your email address. That way, if you're battering their servers, they know who to contact. Default: 'waybackpack'.

https://github.com/jsvine/waybackpack

s-emanuilov commented 5 years ago

Now I have the same issue: `open_http': 503 Service Unavailable (OpenURI::HTTPError)

Is there is solution for this?

Command that I ran: wayback_machine_downloader http://abv.bg --to 20100916231334

Result: image

vuquach-au commented 5 years ago

I have run this smoothly for over a year until last week, The same issue occur as below, it takes me hours to research for solutions but no hope. Anyone can help?

Command that i ran: wayback_machine_downloader http://www.eupa.com.au --to 20090731095531

Result:

screen shot 2019-03-05 at 11 12 05 am

Stefan2142 commented 5 years ago

Im also getting 503 error with command: wayback_machine_downloader 4software-development.com -l

MrBryan commented 4 years ago

Boo. Just fetched this tool and got same error. Anyone find a workaround?

Stefan2142 commented 4 years ago

Boo. Just fetched this tool and got same error. Anyone find a workaround?

It's probably because of a given url change the url/website and try again. I know I fixed the issue but forgot how

ArtoriusN commented 2 years ago

I have the same problem with any site. Changing ip and vpn does not help. At the same time, sites and http://web.archive.org/web/20210202032904/http://www.guimp.com/ open normally in the browser on the server from which I want to download.

`wayback_machine_downloader http://www.guimp.com/ Downloading http://www.guimp.com/ to websites/www.guimp.com/ from Wayback Machine archives.

Getting snapshot pages/usr/lib/ruby/2.3.0/open-uri.rb:359:in open_http': 503 Service Temporarily Unavailable (OpenURI::HTTPError) from /usr/lib/ruby/2.3.0/open-uri.rb:737:inbuffer_open' from /usr/lib/ruby/2.3.0/open-uri.rb:212:in block in open_loop' from /usr/lib/ruby/2.3.0/open-uri.rb:210:incatch' from /usr/lib/ruby/2.3.0/open-uri.rb:210:in open_loop' from /usr/lib/ruby/2.3.0/open-uri.rb:151:inopen_uri' from /usr/lib/ruby/2.3.0/open-uri.rb:717:in open' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:inget_raw_list_from_api' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:88:in get_all_snapshots_to_consider' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:inget_file_list_curated' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:infile_list_by_timestamp' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/2.3.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in<top (required)>' from /usr/local/bin/wayback_machine_downloader:22:in load' from /usr/local/bin/wayback_machine_downloader:22:in

' `

rachidlamari commented 2 years ago

Is anyone got the solution ?

sirastynax commented 1 year ago

For those who have stumbled on this issue, it appears that the URL that needs to be queried has changed. To fix the issue I changed the value in archive_api.rb from request_url = URI("https://web.archive.org/cdx/search/xd") to request_url = URI("https://web.archive.org/cdx/search/cdx")

Edit: It has not changed, but this is an additional URL that can be used. I can get it to work with both URL's with repeated query requests.

S767 commented 1 year ago

This bug not by wayback-machine-downloader... I do nothing with code, but after 12 hours or so - all work fine. So just relax and take a break.