pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
144 stars 30 forks source link

Interpolate to the union of the polygons from two dataframes #192

Open sjsrey opened 7 months ago

sjsrey commented 7 months ago
codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (45b9673) 54.82% compared to head (5f15663) 55.83%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #192 +/- ## ========================================== + Coverage 54.82% 55.83% +1.01% ========================================== Files 15 17 +2 Lines 839 865 +26 ========================================== + Hits 460 483 +23 - Misses 379 382 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jGaboardi commented 7 months ago

I agree with @martinfleis on all points above.

sjsrey commented 7 months ago

I am not convinced by either of these changes (but surely won't block them if you want to go ahead).

area_buffer:

  • why it is called area buffer?
  • what you are trying to do there is a relate pattern. I think that this should live in geopandas and make use of sindex without a predicate for the first sift and then shapely.relate for the second, with a mapping of the resulting he DE-9IM intersection matrix to legible names.

This is needed in my use case as I will be interpolating to those three different geometries.

area_faces:

  • if the whole code is a call to overlay and another to area_interpolate, is it something that is needed at all? What is the use case for it? Isn't it better to add an example notebook with that use case calling overlay and than area_interpolate?
  • if so, the name is not suggesting what it is doing at all :).

More of a helper function as I end up using this often. It comes up in tract harmonization research. Could be other use cases.

I'm all ears regarding better names.

knaaptime commented 7 months ago

how about area_buffer --> area_relate? I agree a lower level implementation in geopandas like @martinfleis suggests would probably be ideal, but until then it's quite useful (and fitting) for it to be available here.

on faces, there's probably a more descriptive name that's eluding me, but this ends up being a common pattern that we should document more thoroughly. We tend to think of areal interpolation problems as a complete change of support for the same study area. So our examples are tailored to things like moving from administrative zones to regular hexgrids, where the target is roughly exhaustive of the same study area (subject to some boundary issues, which is why we give allocate_total).

But another common case is when you have a second geometry that doesnt exhaust the study area and you want to understand the allocation of some person/thing/resource inside or outside the second geometry. So like, how much of the population with demographic characteristic X lives inside versus outside the target geometry {tax district, political district, proximity to waterfront, distance from superfund site, etc} where you know the target covers only a portion of the source. Depending on the relationship between their geoms, that can be a tougher to answer than 'select the centroids inside the buffer'.

If you haven't thought about it carefully, you might just interpolate source-->target and compare source vs target (forgetting about the double counting)--or, if you haven' thought of the overlay trick, it would be tempting to interpolate source-->target geom with allocate_total=False, then clip out the target from the source, and interpolate source-->clipped source (again remembering to allocate_total=False) and compare target vs clipped source (doing in 3 steps what you could've done in, like, 1)

serge's notebook covers this, just without quite the detailed context, so maybe we need another applied example, but i think the convenience function might be worth keeping around