oturns / geosnap

The Geospatial Neighborhood Analysis Package
https://oturns.github.io/geosnap-guide
BSD 3-Clause "New" or "Revised" License
247 stars 32 forks source link

update notebooks #108

Closed knaaptime closed 5 years ago

knaaptime commented 5 years ago

This PR updates the example notebooks to demonstrate the new API. it also handles census data a bit more gracefully (and more pythonic-ly) by streaming from the quilt bucket unless the user explicitly chooses to download

sjsrey commented 5 years ago

Merging as tests are passing locally, and the failure on travis is likely due to a pending quilt3 package.

renanxcortes commented 5 years ago

The notebooks are very good! I think store_ltdb is a name that makes more sense for fetching the data, indeed. Question: the community dataset explained in notebook 2 is already in the desired format you want to pass to a harmonization function?

weikang9009 commented 5 years ago

I ran into this error when trying to run 01_getting_started.ipynb image

knaaptime commented 5 years ago

was it in the last hour or so? I think the census website is down at the moment. it kicked me off the FTP and I can't get http://census.gov/ or other .gov sites to load

weikang9009 commented 5 years ago

yep. just now. https://www.census.gov/ is working though

knaaptime commented 5 years ago

Question: the community dataset explained in notebook 2 is already in the desired format you want to pass to a harmonization function?

the one from cell [2] is. it's raw census data, so we'll add a wrapper method around tobler's interpolate function that splits the gdf by year so you'd have a target year and source years

knaaptime commented 5 years ago

but the notebook is still failing?

weikang9009 commented 5 years ago

Just tried it again. Ran into the same error.

knaaptime commented 5 years ago

what line in geosnap is failing? can you paste the full trace?

i wouldve guessed the inflation adjustment (which should be in a try/except anyway) but that hits bls.gov, not api.census.gov

weikang9009 commented 5 years ago

Tried again earlier today. from geosnap.data import data_store seemed to be working now.

Now trying to run data_store.tracts_2000().head().

weikang9009 commented 5 years ago

data_store.tracts_2000().head() ran into the following error (seems to be related to quilt3):

---------------------------------------------------------------------------
timeout                                   Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/urllib3/response.py in _error_catcher(self)
    359             try:
--> 360                 yield
    361 

/anaconda3/lib/python3.7/site-packages/urllib3/response.py in read(self, amt, decode_content, cache_content)
    437                 # cStringIO doesn't like amt=None
--> 438                 data = self._fp.read()
    439                 flush_decoder = True

/anaconda3/lib/python3.7/http/client.py in read(self, amt)
    459                 try:
--> 460                     s = self._safe_read(self.length)
    461                 except IncompleteRead:

/anaconda3/lib/python3.7/http/client.py in _safe_read(self, amt)
    609         while amt > 0:
--> 610             chunk = self.fp.read(min(amt, MAXAMOUNT))
    611             if not chunk:

/anaconda3/lib/python3.7/socket.py in readinto(self, b)
    588             try:
--> 589                 return self._sock.recv_into(b)
    590             except timeout:

/anaconda3/lib/python3.7/ssl.py in recv_into(self, buffer, nbytes, flags)
   1051                   self.__class__)
-> 1052             return self.read(nbytes, buffer)
   1053         else:

/anaconda3/lib/python3.7/ssl.py in read(self, len, buffer)
    910             if buffer is not None:
--> 911                 return self._sslobj.read(len, buffer)
    912             else:

timeout: The read operation timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError                          Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/botocore/response.py in read(self, amt)
     77         try:
---> 78             chunk = self._raw_stream.read(amt)
     79         except URLLib3ReadTimeoutError as e:

/anaconda3/lib/python3.7/site-packages/urllib3/response.py in read(self, amt, decode_content, cache_content)
    458                         # Content-Length are caught.
--> 459                         raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
    460 

/anaconda3/lib/python3.7/contextlib.py in __exit__(self, type, value, traceback)
    129             try:
--> 130                 self.gen.throw(type, value, traceback)
    131             except StopIteration as exc:

/anaconda3/lib/python3.7/site-packages/urllib3/response.py in _error_catcher(self)
    364                 # there is yet no clean way to get at it from this context.
--> 365                 raise ReadTimeoutError(self._pool, None, 'Read timed out.')
    366 

ReadTimeoutError: AWSHTTPSConnectionPool(host='quilt-cgs.s3.amazonaws.com', port=443): Read timed out.

During handling of the above exception, another exception occurred:

ReadTimeoutError                          Traceback (most recent call last)
<ipython-input-4-4e627768ddce> in <module>
----> 1 data_store.tracts_2000().head()

~/Google Drive/python_repos/geosnap/geosnap/data/data.py in tracts_2000(self, convert)
    194 
    195         """
--> 196         t = tracts_cartographic["tracts_2000_500k.parquet"]()
    197         t["year"] = 2000
    198         if convert:

/anaconda3/lib/python3.7/site-packages/quilt3/packages.py in __call__(self, func, **kwargs)
    226         Shorthand for self.deserialize()
    227         """
--> 228         return self.deserialize(func=func, **kwargs)
    229 
    230 

/anaconda3/lib/python3.7/site-packages/quilt3/packages.py in deserialize(self, func, **format_opts)
    179         """
    180         physical_key = _to_singleton(self.physical_keys)
--> 181         data, _ = get_bytes(physical_key)
    182 
    183         if func is not None:

/anaconda3/lib/python3.7/site-packages/quilt3/data_transfer.py in get_bytes(src)
    656         s3_client = create_s3_client()
    657         resp = s3_client.get_object(**params)
--> 658         data = resp['Body'].read()
    659         meta = _parse_metadata(resp)
    660     else:

/anaconda3/lib/python3.7/site-packages/botocore/response.py in read(self, amt)
     79         except URLLib3ReadTimeoutError as e:
     80             # TODO: the url will be None as urllib3 isn't setting it yet
---> 81             raise ReadTimeoutError(endpoint_url=e.url, error=e)
     82         self._amount_read += len(chunk)
     83         if amt is None or (not chunk and amt > 0):

ReadTimeoutError: Read timeout on endpoint URL: "None"
knaaptime commented 5 years ago

weird. so its no longer having trouble connecting to census (and we still dont know where geosnap was raising that error) and now we have an error trying to connect to our quilt s3 bucket.

what happens if you try geosnap.data.store_census()?

weikang9009 commented 5 years ago

Could this have sth to do with the internet speed? The errors occurred when I ran the notebook at home where the WIFI is slow.

I rerun the notebook in the center, and it works fine.