pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
146 stars 30 forks source link

Possibility of using area_interpolate_binning for the raster approach #2

Closed renanxcortes closed 2 years ago

renanxcortes commented 5 years ago

Currently, we have a faster approach for the areal method of interpolation relying on the binning approach and sparse matrices. However, to use this in the area_tables_interpolate, we rely on the area of the binning in intersection.area of

https://github.com/pysal/tobler/blob/c89d1e4a9af1db5916a589c32088c8ada3541499/tobler/area_weighted.py#L105

We don't have this attribute in the raster approach, the area "proxy" is a variable called "Populated_Pixels":

https://github.com/pysal/tobler/blob/c89d1e4a9af1db5916a589c32088c8ada3541499/tobler/area_weighted.py#L449

If there would be a way to "force" the area of a polygon with a user-specified number in geopandas, this would do the trick. But I have doubts that this might be implemented and some alternatives need be raised.

knaaptime commented 4 years ago

@renanxcortes i think this is resolved, no?

renanxcortes commented 4 years ago

@renanxcortes i think this is resolved, no?

Hm.. I think this is still an open issue... In what PR this would be fixed?

knaaptime commented 4 years ago

ah, you're right. we're still using the slow one when doing raster masking

knaaptime commented 4 years ago

I have an idea how to solve this, so gonna hack on it this week

knaaptime commented 4 years ago

after some more thought, that first attempt isnt going to speed anything up--the speedup we get from area_interpolate_binning really comes from the use of area_tables_binning and its clever partitioning of the geodataframe. AFAICT, there's no way to get that speedup by partitioning the raster.

Instead, im thinking that if converting to a vector is relatively performant, we could just

knaaptime commented 4 years ago

also, pinging @darribas in case he has thoughts, since i know he's been in raster mode lately

knaaptime commented 4 years ago

after testing lots of options, i think it would be possible to use area_tables_binning for rasters if we really wanted, but it would actually degrade performance over the current implementation.

The area_tables_binning function speeds things up by optimizing intersections between source and target gdfs and ignoring geometries that dont overlap. The slower version, area_tables calculates the union of source and target up front, then masks out uninhabited raster cells, and calculates zonal stats to estimate population per pixel before reaggregating into the target geometries. The issue here is that reading from the raster is expensive, and if we dont take the union of source and target up front, then we need to read in the raster for each geometry in this intersection so that populated_pixels is available during the binning phase. That's doable but it takes longer

I have a version of area_tables_binning that does what i describe above and it takes about a minute longer than our current implementation

image

its possible those scale differently though, depending on the number of polygon intersections the binning approach might end up better