Closed renanxcortes closed 2 years ago
@renanxcortes i think this is resolved, no?
@renanxcortes i think this is resolved, no?
Hm.. I think this is still an open issue... In what PR this would be fixed?
ah, you're right. we're still using the slow one when doing raster masking
I have an idea how to solve this, so gonna hack on it this week
after some more thought, that first attempt isnt going to speed anything up--the speedup we get from area_interpolate_binning
really comes from the use of area_tables_binning
and its clever partitioning of the geodataframe. AFAICT, there's no way to get that speedup by partitioning the raster.
Instead, im thinking that if converting to a vector is relatively performant, we could just
area_interpolate_binning
directly, as now all data are vector formatalso, pinging @darribas in case he has thoughts, since i know he's been in raster mode lately
after testing lots of options, i think it would be possible to use area_tables_binning for rasters if we really wanted, but it would actually degrade performance over the current implementation.
The area_tables_binning
function speeds things up by optimizing intersections between source and target gdfs and ignoring geometries that dont overlap. The slower version, area_tables
calculates the union of source and target up front, then masks out uninhabited raster cells, and calculates zonal stats to estimate population per pixel before reaggregating into the target geometries. The issue here is that reading from the raster is expensive, and if we dont take the union of source and target up front, then we need to read in the raster for each geometry in this intersection so that populated_pixels
is available during the binning phase. That's doable but it takes longer
I have a version of area_tables_binning
that does what i describe above and it takes about a minute longer than our current implementation
its possible those scale differently though, depending on the number of polygon intersections the binning approach might end up better
Currently, we have a faster approach for the areal method of interpolation relying on the binning approach and sparse matrices. However, to use this in the area_tables_interpolate, we rely on the area of the binning in
intersection.area
ofhttps://github.com/pysal/tobler/blob/c89d1e4a9af1db5916a589c32088c8ada3541499/tobler/area_weighted.py#L105
We don't have this attribute in the raster approach, the area "proxy" is a variable called "Populated_Pixels":
https://github.com/pysal/tobler/blob/c89d1e4a9af1db5916a589c32088c8ada3541499/tobler/area_weighted.py#L449
If there would be a way to "force" the area of a polygon with a user-specified number in geopandas, this would do the trick. But I have doubts that this might be implemented and some alternatives need be raised.