Closed knaaptime closed 5 years ago
Merging as tests are passing locally, and the failure on travis is likely due to a pending quilt3 package.
The notebooks are very good! I think store_ltdb
is a name that makes more sense for fetching the data, indeed. Question: the community dataset explained in notebook 2 is already in the desired format you want to pass to a harmonization function?
I ran into this error when trying to run 01_getting_started.ipynb
was it in the last hour or so? I think the census website is down at the moment. it kicked me off the FTP and I can't get http://census.gov/ or other .gov sites to load
yep. just now. https://www.census.gov/ is working though
Question: the community dataset explained in notebook 2 is already in the desired format you want to pass to a harmonization function?
the one from cell [2] is. it's raw census data, so we'll add a wrapper method around tobler's interpolate function that splits the gdf
by year so you'd have a target year and source years
but the notebook is still failing?
Just tried it again. Ran into the same error.
what line in geosnap is failing? can you paste the full trace?
i wouldve guessed the inflation adjustment (which should be in a try/except anyway) but that hits bls.gov, not api.census.gov
Tried again earlier today. from geosnap.data import data_store
seemed to be working now.
Now trying to run data_store.tracts_2000().head()
.
data_store.tracts_2000().head()
ran into the following error (seems to be related to quilt3
):
---------------------------------------------------------------------------
timeout Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/urllib3/response.py in _error_catcher(self)
359 try:
--> 360 yield
361
/anaconda3/lib/python3.7/site-packages/urllib3/response.py in read(self, amt, decode_content, cache_content)
437 # cStringIO doesn't like amt=None
--> 438 data = self._fp.read()
439 flush_decoder = True
/anaconda3/lib/python3.7/http/client.py in read(self, amt)
459 try:
--> 460 s = self._safe_read(self.length)
461 except IncompleteRead:
/anaconda3/lib/python3.7/http/client.py in _safe_read(self, amt)
609 while amt > 0:
--> 610 chunk = self.fp.read(min(amt, MAXAMOUNT))
611 if not chunk:
/anaconda3/lib/python3.7/socket.py in readinto(self, b)
588 try:
--> 589 return self._sock.recv_into(b)
590 except timeout:
/anaconda3/lib/python3.7/ssl.py in recv_into(self, buffer, nbytes, flags)
1051 self.__class__)
-> 1052 return self.read(nbytes, buffer)
1053 else:
/anaconda3/lib/python3.7/ssl.py in read(self, len, buffer)
910 if buffer is not None:
--> 911 return self._sslobj.read(len, buffer)
912 else:
timeout: The read operation timed out
During handling of the above exception, another exception occurred:
ReadTimeoutError Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/botocore/response.py in read(self, amt)
77 try:
---> 78 chunk = self._raw_stream.read(amt)
79 except URLLib3ReadTimeoutError as e:
/anaconda3/lib/python3.7/site-packages/urllib3/response.py in read(self, amt, decode_content, cache_content)
458 # Content-Length are caught.
--> 459 raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
460
/anaconda3/lib/python3.7/contextlib.py in __exit__(self, type, value, traceback)
129 try:
--> 130 self.gen.throw(type, value, traceback)
131 except StopIteration as exc:
/anaconda3/lib/python3.7/site-packages/urllib3/response.py in _error_catcher(self)
364 # there is yet no clean way to get at it from this context.
--> 365 raise ReadTimeoutError(self._pool, None, 'Read timed out.')
366
ReadTimeoutError: AWSHTTPSConnectionPool(host='quilt-cgs.s3.amazonaws.com', port=443): Read timed out.
During handling of the above exception, another exception occurred:
ReadTimeoutError Traceback (most recent call last)
<ipython-input-4-4e627768ddce> in <module>
----> 1 data_store.tracts_2000().head()
~/Google Drive/python_repos/geosnap/geosnap/data/data.py in tracts_2000(self, convert)
194
195 """
--> 196 t = tracts_cartographic["tracts_2000_500k.parquet"]()
197 t["year"] = 2000
198 if convert:
/anaconda3/lib/python3.7/site-packages/quilt3/packages.py in __call__(self, func, **kwargs)
226 Shorthand for self.deserialize()
227 """
--> 228 return self.deserialize(func=func, **kwargs)
229
230
/anaconda3/lib/python3.7/site-packages/quilt3/packages.py in deserialize(self, func, **format_opts)
179 """
180 physical_key = _to_singleton(self.physical_keys)
--> 181 data, _ = get_bytes(physical_key)
182
183 if func is not None:
/anaconda3/lib/python3.7/site-packages/quilt3/data_transfer.py in get_bytes(src)
656 s3_client = create_s3_client()
657 resp = s3_client.get_object(**params)
--> 658 data = resp['Body'].read()
659 meta = _parse_metadata(resp)
660 else:
/anaconda3/lib/python3.7/site-packages/botocore/response.py in read(self, amt)
79 except URLLib3ReadTimeoutError as e:
80 # TODO: the url will be None as urllib3 isn't setting it yet
---> 81 raise ReadTimeoutError(endpoint_url=e.url, error=e)
82 self._amount_read += len(chunk)
83 if amt is None or (not chunk and amt > 0):
ReadTimeoutError: Read timeout on endpoint URL: "None"
weird. so its no longer having trouble connecting to census (and we still dont know where geosnap was raising that error) and now we have an error trying to connect to our quilt s3 bucket.
what happens if you try geosnap.data.store_census()
?
Could this have sth to do with the internet speed? The errors occurred when I ran the notebook at home where the WIFI is slow.
I rerun the notebook in the center, and it works fine.
This PR updates the example notebooks to demonstrate the new API. it also handles census data a bit more gracefully (and more pythonic-ly) by streaming from the quilt bucket unless the user explicitly chooses to download