pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
144 stars 30 forks source link

masked_area_interpolate KeyError #195

Closed swhisnant5201 closed 7 months ago

swhisnant5201 commented 7 months ago

Hi All,

I am working on the same code as my last issue, which you were able to help quickly solve. This time, I am trying to use the masked_area_interpolate method with the same data and the raster you provide in the notebook. When only running p = Package.browse("rasters/nlcd", "s3://spatial-ucr") p["nlcd_2011.tif"].fetch() it works without issue, however, when I attempt to call the method with "nlcd_2011.tif" as my raster, I get the following error:

Traceback (most recent call last):

  File C:\miniconda3\envs\tutorials\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\gispy\tobler\tobler.py:79
    dasy_inter(pop_stat, "nlcd_2011.tif" ,nhood,"PERCENT_AS", "Asian Percentage")

  File c:\gispy\tobler\tobler.py:43 in dasy_inter
    results = tobler.dasymetric.masked_area_interpolate(raster=target_rast, source_df=source_file, target_df=target_file, intensive_variables=variable, pixel_values = [21, 22, 23, 24])

  File C:\miniconda3\envs\tutorials\Lib\site-packages\tobler\dasymetric\masked_area_interpolate.py:82 in masked_area_interpolate
    interpolation = area_interpolate(

  File C:\miniconda3\envs\tutorials\Lib\site-packages\tobler\area_weighted\area_interpolate.py:345 in _area_interpolate_binning
    vals = _nan_check(source_df, variable)

  File C:\miniconda3\envs\tutorials\Lib\site-packages\tobler\util\util.py:49 in _nan_check
    values = df[column].values

  File C:\miniconda3\envs\tutorials\Lib\site-packages\geopandas\geodataframe.py:1474 in __getitem__
    result = super().__getitem__(key)

  File C:\miniconda3\envs\tutorials\Lib\site-packages\pandas\core\frame.py:3893 in __getitem__
    indexer = self.columns.get_loc(key)

  File C:\miniconda3\envs\tutorials\Lib\site-packages\pandas\core\indexes\base.py:3797 in get_loc
    raise KeyError(key) from err

KeyError: 'P'

If I replace the raster with "nlcd_2001.tif", I get this instead:

C:\miniconda3\envs\tutorials\Lib\site-packages\pyproj\crs\crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
C:\miniconda3\envs\tutorials\Lib\site-packages\pyproj\crs\crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Traceback (most recent call last):

  File C:\miniconda3\envs\tutorials\Lib\site-packages\pandas\core\indexes\base.py:3790 in get_loc
    return self._engine.get_loc(casted_key)

  File index.pyx:152 in pandas._libs.index.IndexEngine.get_loc

  File index.pyx:181 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:7080 in pandas._libs.hashtable.PyObjectHashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:7088 in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'P'

Here is my complete code

import os
#used to set the working directory
import geopandas as gpd
#used to read the files
import matplotlib.pyplot as plt
#used to plot the data
import tobler
#used for interpolation purposes
from zipfile import ZipFile
#unzips data
from quilt3 import Package

os.chdir("C:\\gispy\\Tobler")

def unzip_data(zip_file): 
    my_zip = ZipFile(zip_file, "r")
    my_zip.extractall()
    del(my_zip)
#reads and extracts files to working directory

def areal_inter(source_file, target_file, variable, title=""):
   result = tobler.area_weighted.area_interpolate(source_df=source_file, target_df=target_file, extensive_variables = [variable]) 
   fig, ax = plt.subplots(1,2, figsize=(14,7))

   result.plot(variable, scheme='quantiles',  ax=ax[0], legend = True, alpha = 0.5, k = 5, edgecolor = 'black' )
   source_file.plot(variable, scheme='quantiles',  ax=ax[1], legend = True, alpha = 0.5, k = 5, edgecolor = "black")

   ax[0].set_title('interpolated')
   ax[1].set_title('original')

   for ax in ax:
       ax.axis('off')
   fig.suptitle(title)
#calls tobler's area weighted areal interpolation method, plots output data

def dasy_inter(source_file, target_rast, target_file, variable, title=""):
   results = tobler.dasymetric.masked_area_interpolate(raster=target_rast, source_df=source_file, target_df=target_file, intensive_variables=variable, pixel_values = [21, 22, 23, 24]) 
   fig, ax = plt.subplots(1,2, figsize=(14,7))

   results.plot(variable, scheme='quantiles',   ax=ax[0])
   source_file.plot(variable, scheme='quantiles',  ax=ax[1])

   ax[0].set_title('interpolated')
   ax[1].set_title('original')
   for ax in ax:
       ax.axis('off')
   fig.suptitle(title) 
#calls tobler's dasymetric areal interpolation, plots output data

nhood = gpd.read_file("Neighborhoods_Philadelphia")
pop_stat = gpd.read_file("Vital_Population_CT")
p = Package.browse("rasters/nlcd", "s3://spatial-ucr")
p["nlcd_2011.tif"].fetch()

#opens and reads the necessary files for the interpolation methods

print(pop_stat.columns)
#checks column names for potential variables

pop_stat = pop_stat[pop_stat["COUNT_ALL_"] != 0]
#remove census tracts where no one lives, since they can't have a percentage

crs = 2272
#set the crs for Pennsylvania South USft

pop_stat = pop_stat.to_crs(crs)
#reproject the necessary data to the correct CRS for Philadelphia

#unzip_data("Tobler_Data.zip")  
areal_inter(pop_stat, nhood, "PERCENT_AS", "Asian Percentage")
areal_inter(pop_stat, nhood, "PERCENT_BL", "African American Percentage")
dasy_inter(pop_stat, "nlcd_2001.tif" ,nhood,"PERCENT_AS", "Asian Percentage")
#Call our earlier defined functions with the necessary parameters

I've tried a number of fixes and other ways to work with the raster, but am at a loss. Any help or guidance that you are able to offer is appreciated!

knaaptime commented 7 months ago

maybe try giving the full path to the raster file. These lines download an nlcd file to your local directory

p = Package.browse("rasters/nlcd", "s3://spatial-ucr") 
p["nlcd_2011.tif"].fetch()

then the dasymetric.masked_area_interpolate function takes the path to a raster and opens it. But making sure you provide the correct path to the function (that actually points to the nlcd_2011.tif file on your computer) is up to you. I can verify that both the quilt download and the function in tobler are working correctly, but unfortunately, that's probably about all the help we can provide here. The team is limited on resources so we can only afford to focus on issues related to the software itself

knaaptime commented 7 months ago

also, until you have a good handle on what you're doing, it's probably best to avoid those wrapper functions because they will just make debugging one step harder :). The plots are pretty straightforward once you have the results in hand