pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
144 stars 30 forks source link

Problems using pycno (handling of nulls) #176

Closed beccasealy closed 1 year ago

beccasealy commented 1 year ago

I'm currently trying to use tobler for pycnophylactic interpolation, but I'm running into some problems. When I run pycno_interpolate normally, regardless of the cellsize I pick, I get this error from astropy: "WARNING: nan_treatment='interpolate', however, NaN values detected post convolution. A contiguous region of NaN values, larger than the kernel size, are present in the input array. Increase the kernel size to avoid this. [astropy.convolution.convolve]".

I thought there might be some conflict with how nulls are handled and the data I'm using, so I tried to use "handle_null = False" to turn this off, but I'm getting "NameError: name 'smooth2d' is not defined". I checked the source code, and it looks like the smooth2d method is sometimes referred to as smooth2D and sometimes as smooth2d, which seems to be causing issues.

knaaptime commented 1 year ago

Are you able to share any of your data to create a reproducible example?

beccasealy commented 1 year ago

sample_data.zip Yeah, attached are the shapefiles I'm using! It's population data from the 1870 federal census, and I'm starting with Georgia's populations by county (I've been using the column titled AK3002, which is the Black population in Georgia) and trying to get interpolated estimates for the boxes in the 12km square grid. Code: _pycno_ga = tobler.pycno.pycno_interpolate(ga_counties, gagrid, ['AK3002'], 12000, verbose=True) originally, and _pycno_ga = tobler.pycno.pycno_interpolate(ga_counties, ga_grid, ['AK3002'], 12000, handlenull = False, verbose=True) is what throws the NameError.

knaaptime commented 1 year ago

cool, thanks for raising this (and sharing your data).

So in the first example, I think everything is working as intended. There's no actual error, just a warning coming from astropy that there are still some sections of nan values in the intermediate raster produced during the convolution phase. In this case those nan's become zeros (which is appropriate here because the total area of your output kernel is the same as the total input variable). See a discussion of the nan_handle parameter in Dan's notebook here

The interpolation completes just fine (with the correct sum, so your original input volume is preserved)

Screenshot 2023-07-04 at 9 43 54 AM

so I don't think there's a real issue, you should be able to continue using that just fine.

That's definitely a bug-inducing typo when handle_null=False though. As you suggested, we just need call smooth2D instead of smooth2d on line 160 of pycno.py. If i make that change locally, there's no error in the second version either (i.e., when you pass handle_null=False), so that should be a quick fix. Thanks for the find

knaaptime commented 1 year ago

(just as an aside, I'm sure you already know, but by interpolating historical county data down to a regular 12km grid, you're adopting a strong assumption that population is distributed uniformly within a county. If some of the results look a little funny, you might want to consider clipping out pieces of the original counties that you know weren't inhabited in 1870, then conducting the pycno interpolation using the clipped data as the source)

but there are also some odd things about your target grid, where it looks like little pieces of (maybe?) water features have been digitized in or something. Those slivers may cause some oddities as well

Screenshot 2023-07-04 at 2 25 53 PM Screenshot 2023-07-04 at 2 23 46 PM
beccasealy commented 1 year ago

Okay, great! Yeah, we're planning to incorporate land use data in the future so that we don't assume uniformity in population distributions. Thanks for the heads up on the grid, too -- I generated it in QGIS, so I'll have to see if something went wrong there.

knaaptime commented 1 year ago

sweet. hope it comes together nicely.

Not sure if you're wedded to a rectangular grid, but we also have hexagonal gridding here in tobler if you want to explore some other options. Not sure why qgis produces those artifacts

knaaptime commented 1 year ago

those county boundaries may not be planar enforced