opsdisk / metagoofil

Search Google and download specific file types
Other
405 stars 85 forks source link

urllib2.HTTPError: HTTP Error 503: Service Unavailable #10

Closed Qayinz closed 5 years ago

Qayinz commented 6 years ago

Can't pinpoint what is causing this, it usually happens after a big search (-l 500 for example), but after that if persists to even the smallest searches (as seen below)

after that it keeps giving that error no matter what i do and eventually it works for one more run.

# python metagoofil.py -d apple.com -t pdf,doc -l 50 -n 5 -o test -w -f
[*] Downloaded files will be saved here: test
[*] Searching for 50 .pdf files and waiting 30.0 seconds between searches
Traceback (most recent call last):
  File "metagoofil.py", line 226, in <module>
    mg.go()
  File "metagoofil.py", line 124, in go
    for url in googlesearch.search(query, start=0, stop=self.search_max, num=100, pause=self.delay, extra_params={'filter': '0'}, user_agent=self.user_agent):
  File "/usr/local/lib/python2.7/dist-packages/googlesearch/__init__.py", line 359, in search
    html = get_page(url)
  File "/usr/local/lib/python2.7/dist-packages/googlesearch/__init__.py", line 147, in get_page
    response = urlopen(request)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 467, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Unavailable

Any ideas?

opsdisk commented 6 years ago

Hi @Qayinz - it's likely you're making the requests too fast and Google is rightly detecting it as a bot. Try increasing the -e switch value. It will take longer, but you shouldn't get that error.