usdigitalresponse / covid-exit-strategy

Producing data for the Covid Exit Strategy website.
https://www.covidexitstrategy.org/
GNU General Public License v3.0
13 stars 3 forks source link

Fix timeout on final data upload #46

Open lucasmbrown opened 4 years ago

lucasmbrown commented 4 years ago

On the final upload, we time out. We're probably (newly) hitting this because we've added fields over time as well as added more data each day, since we upload all historical data which keeps growing.

We could do any of the following:

  1. Reduce the size of the data we're uploading by limiting fields, date ranges, etc

  2. Fork our gsheets uploader client to send a new timeout parameter as part of the requests usage.

  3. Change uploader clients (I know @wesmwoo has been looking at this already, since this client gspread uploads all data as text rather than numbers).

Or something else?

The full stack trace is:

Beginning to upload data to workbook 1s534JoVjsetLDUxzkww3yQSnRj9H-8QLMKPUrq7RAuc and tab All State Data...
Traceback (most recent call last):
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
socket.timeout: The write operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/Users/lucas/.pyenv/versions/3.6.9/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', timeout('The write operation timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lucas/repos/covid-exit-strategy/main.py", line 220, in <module>
    extract_transform_and_load_covid_data(post_to_google_sheets=True)
  File "/Users/lucas/repos/covid-exit-strategy/main.py", line 214, in extract_transform_and_load_covid_data
    credentials=credentials,
  File "/Users/lucas/repos/covid-exit-strategy/covid/load.py", line 20, in post_dataframe_to_google_sheets
    col_names=True,
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/df2gspread/df2gspread.py", line 147, in upload
    wks.update_cells(cell_list)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/gspread/models.py", line 909, in update_cells
    body={'values': values_rect},
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/gspread/models.py", line 235, in values_update
    r = self.client.request('put', url, params=params, json=body)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/gspread/client.py", line 67, in request
    headers=headers,
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/requests/sessions.py", line 590, in put
    return self.request('PUT', url, data=data, **kwargs)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/google/auth/transport/requests.py", line 450, in request
    **kwargs
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/Users/lucas/.virtualenvs/covid1/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out',))
lucasmbrown commented 4 years ago

I don't think we use the All Data sheet for anything for the website right now, although it has been a handy reference occasionally. So we could also choose option

  1. Stop uploading the CDC Guidance (All Data) sheet