xmunoz / sodapy

Python client for the Socrata Open Data API
MIT License
402 stars 114 forks source link

Issue with upsert/replace #67

Closed tarmangue closed 4 years ago

tarmangue commented 4 years ago

I keep getting the same error when using upsert or replace:

Traceback (most recent call last):
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\http\client.py", line 1321, in getresponse
    response.begin()
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\http\client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\http\client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 307, in recv_into
    raise timeout('The read operation timed out')
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\util\retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='mydata.iadb.org', port=443): Read timed out. (read timeout=10)

I have tried using csv and json as the data format, neither work. Any idea what is going on?

xmunoz commented 4 years ago

This error message indicates that it's probably a network issue. Looks like the request times out after 10 seconds. Have you tried to perform this action manually or with curl to see if it works?

xmunoz commented 4 years ago

Also, if you could provide the code that caused this exception that would be helpful for debugging.

tarmangue commented 4 years ago

I have tried with longer timeouts too, same result. Is there a limit to the upload size? Anyway, below is the code:

domain = "example.domain.org"
dataset = "abcd-1234"
client = Socrata(domain, "aBcDeF123455969", username="example@email.com", password="password")
data = open("data.json", encoding='utf-8')
print(client.replace(dataset, data))
client.close()
tarmangue commented 4 years ago

Update, if I create a ficticious row for the dataset that I am trying to update, and upsert it like so:

client = Socrata(domain, token, username=user, password=pwd)
data = [{'col1': 'AAA', 'col2': 'BBB'}]
print(client.upsert(dataset, data))
client.close()

I get the expected behaviour. Which makes me think the problem might be the fact that I am trying to push a 600k row dataset?

xmunoz commented 4 years ago

Yes, that is almost certainly the cause. Try splitting up the upsert into a few, more manageably-sized operations.