pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
144 stars 30 forks source link

Results returning 0s for extensive variable #193

Closed swhisnant5201 closed 7 months ago

swhisnant5201 commented 7 months ago

I'm using the standard areal interpolation method from the overview notebook, but encountering a recurring issue. The code runs without any errors, but when I go to plot the results, it's all the same color on the plot. I looked at the variable I'm interpolating in the results gdf, but it's listed as 0 for every row.

Here is my code, where the source_file is this data and the target_file is this. The extensive variable that I'm looking for is the "PERCENT_AS" column from the source. All CRS were set to 32128

result = tobler.area_weighted.area_interpolate(source_df=source_file, target_df=target_file, extensive_variables = [variable])

All CRS were set to 32128

knaaptime commented 7 months ago

can you share the rest of your code? This is all i need to generate a working example with those data

EDIT: sorry, didnt see you provided the variable you were looking for. Here's a working example following what you sent. Can you share your code?

import geopandas as gpd
from tobler.area_weighted import area_interpolate

tracts = gpd.read_file(
    "https://opendata.arcgis.com/api/v3/datasets/20332a074f0446b3b3190ba9d68b863e_0/downloads/data?format=shp&spatialRefId=4326"
)

hoods = gpd.read_file(
    "https://github.com/azavea/geo-data/raw/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.zip"
)

result = area_interpolate(
    source_df=tracts.to_crs(hoods.crs),
    target_df=hoods,
    intensive_variables=["PERCENT_AS"],
)
Screenshot 2023-11-27 at 11 40 55 AM
knaaptime commented 7 months ago

(also, i presume you mean to treat PERCENT_AS as an intensive variable? if you want to treat a percentage as extensive for some reason, that code works too)

swhisnant5201 commented 7 months ago

Here's my code

`import os import geopandas as gpd import matplotlib.pyplot as plt import tobler from zipfile import ZipFile

os.chdir("C:\gispy\Tobler")

def unzip_data(zip_file): data retrieval
my_zip = ZipFile(zip_file, "r") my_zip.extractall() del(my_zip)

unzip_data("Tobler_Data.zip")

cen_blocks = gpd.read_file("Census_Blocks_2010") nhood = gpd.read_file("Neighborhoods_Philadelphia") pop_stat = gpd.read_file("Vital_Population_CT")

crs = 32128

cen_blocks.crs = crs nhood.crs = crs pop_stat.crs = crs

pop_stat = pop_stat[pop_stat["COUNTALL"] != 0] pop_stat.geometry = pop_stat.buffer(0) nhood.geometry = nhood.buffer(0)

def areal_inter(source_file, target_file, variable): result = tobler.area_weighted.area_interpolate(source_df=source_file, target_df=target_file, extensive_variables = [variable]) fig, ax = plt.subplots(1,2, figsize=(14,7))

result.plot(variable, scheme='quantiles', ax=ax[0]) pop_stat.plot(variable, scheme='quantiles', ax=ax[1])

ax[0].set_title('interpolated') ax[1].set_title('original')

for ax in ax: ax.axis('off') fig.suptitle('Asian Percentage (Extensive)')

areal_inter(pop_stat, nhood, "PERCENT_AS")`

knaaptime commented 7 months ago

the issue is here :

crs = 32128

cen_blocks.crs = crs
nhood.crs = crs
pop_stat.crs = crs

you're defining the CRS to 32128 in all cases, not reprojecting the data into 32128. For that reason, none of the underlying geometries actually overlay (because their XYs are in wildly different places) and thats why you end up with 0s

swhisnant5201 commented 7 months ago

That makes sense, I wasn't even thinking about that. Thanks!

martinfleis commented 7 months ago

I'll try to make sure this mistake is not possible in future via some changes in geopandas - https://github.com/geopandas/geopandas/issues/3085