haaspt / PollTrack

A utility for pulling 2016 General Election data from Huffington Post Pollster
MIT License
0 stars 0 forks source link

HTTPError: HTTP Error 504: Gateway Timeout #9

Open haaspt opened 8 years ago

haaspt commented 8 years ago

Try/Except statement in PollIO not catching HTTP errors.

2016-10-15 09:25:24,080-pollio :: ERROR :: Traceback (most recent call last):
  File "/home/pi/Developer/PollTrack/pollio.py", line 42, in get_latest_poll_data
    df = pd.read_csv(csv_url)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 301, in _read
    compression=kwds.get('compression', None))
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 308, in get_filepath_or_buffer
    req = _urlopen(str(filepath_or_buffer))
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 504: Gateway Timeout
lzkelley commented 8 years ago

If you offset the code in a paragraph with each line indented by 4-spaces, then it will format a lot better.

haaspt commented 8 years ago

Fixed the formatting issue, needed ```

haaspt commented 8 years ago

Another error log:

2016-10-21 01:14:02,700-pollio :: ERROR :: Traceback (most recent call last):
  File "/home/pi/Developer/PollTrack/pollio.py", line 42, in get_latest_poll_data
    df = pd.read_csv(csv_url)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 301, in _read
    compression=kwds.get('compression', None))
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 308, in get_filepath_or_buffer
    req = _urlopen(str(filepath_or_buffer))
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -2] Name or service not known>
lzkelley commented 8 years ago

This looks less like a problem with the code, and more with some sort of external factor or environment. At least, assuming the problem is intermittent? Like there could be limits to querying the same address X number of times per Y time interval, or a particular DNS lookup fails, or the connection is broken for whatever network reason... etc. If it's intermittent, then it seems like finding an elegant way to catch the error, and try again might be the best way to go?

Edit:
There seems to be a recurring Linux issue with this error caused by a DNS cache not being reset, e.g.

The behaviour you describe is, on Linux, a peculiarity of glibc. It only reads "/etc/resolv.conf" once, when loading. glibc can be forced to re-read "/etc/resolv.conf" via the res_init() function.

haaspt commented 8 years ago

Definitely. The problem is that the error causes the program to crash, despite my attempts to catch the exceptions.

I'm going to try fetching the data directly using urllib or requests, rather than relying on pandas as a wrapper.