winstonyym / urbanity

A network-based python package to understand and model urban complexity
https://urbanity.readthedocs.io/en/latest/
MIT License
134 stars 12 forks source link

need pyarrow and parquet for get aggregate stats with multiple subzones input. #10

Closed Ultios closed 7 months ago

Ultios commented 8 months ago

OS: MacOS Ventura

From basic-functionality notebook in the m4 example of multiple seattle subzones, it needs pyarrow and parguet as dependency to use get_aggregate_stats function.

attr_stats_sea = m4.get_aggregate_stats("Seattle", column='CRA_NAM')

output:

Cell In[19], line 1
----> 1 attr_stats_sea = m4.get_aggregate_stats("Seattle", column='CRA_NAM')

File [~/anaconda3/envs/urbanity/lib/python3.9/site-packages/urbanity/urbanity.py:1222](https://file+.vscode-resource.vscode-cdn.net/Users/maman/Documents/PhD/Urbanity/~/anaconda3/envs/urbanity/lib/python3.9/site-packages/urbanity/urbanity.py:1222), in Map.get_aggregate_stats(self, location, filepath, column, bandwidth, get_svi, network_type)
   1219         pop_list[i] = pop_list[i].to_crs(local_crs)
   1221 if self.country in tiled_country:    
-> 1222     pop_list, target_cols = get_tiled_population_data(self.country, bounding_poly=self.polygon_bounds)
   1223     for i in range(len(pop_list)):
   1224         pop_list[i] = pop_list[i].to_crs(local_crs)

File [~/anaconda3/envs/urbanity/lib/python3.9/site-packages/urbanity/population.py:90](https://file+.vscode-resource.vscode-cdn.net/Users/maman/Documents/PhD/Urbanity/~/anaconda3/envs/urbanity/lib/python3.9/site-packages/urbanity/population.py:90), in get_tiled_population_data(country, bounding_poly)
     88 for tile in target_tiles:
     89     data_link = general_pop_dict[country][f'tile_all_{tile}.parquet']
---> 90     tile_df = pd.read_parquet(data_link)
     91     tile_df = tile_df[(tile_df['latitude'] >= miny) & (tile_df['latitude'] <= maxy) & (tile_df['longitude'] >= minx) & (tile_df['longitude'] <= maxx)]
     92     point_df = pd.concat([point_df, tile_df])

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
Trying to import the above resulted in these errors:
 - Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
 - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

I solved this by installing fastparquet but I think it would be better to state it in dependency (e.g. environment.yml) since it is a basic functionality.

Ultios commented 7 months ago

I think it is solved with recent update of environment.yml by @winstonyym.