mggg / maup

The geospatial toolkit for redistricting data.
https://maup.readthedocs.io/en/latest/
MIT License
65 stars 23 forks source link

Disaggregating population from block groups to blocks changes total population #40

Closed gabeschoenbach closed 3 years ago

gabeschoenbach commented 3 years ago

Using the latest version of maup (version 1.0.1), disaggregating CVAP data from block groups down to blocks gives strange behavior.

import geopandas as gpd
import maup
import math

bgs = gpd.read_file("bgs.shp")
blocks = gpd.read_file("blocks.shp")

Even though these shapefiles came from the Census, there are a couple geometry errors that need to be fixed:

blocks[blocks["geometry"].apply(lambda x: isinstance(x, (Polygon, MultiPolygon)))]
bgs["geometry"] = maup.close_gaps(bgs)

Cropping and dropping empty geometries (per @InnovativeInventor and @amybecker's suggestions) doesn't change anything — in this case there aren't any empty geometries:

blocks["geometry"] = maup.crop_to(blocks, bgs)
bgs["geometry"] = maup.crop_to(bgs, blocks)
blocks = blocks[~blocks["geometry"].is_empty]
bgs = bgs[~bgs["geometry"].is_empty]

After doing this, the area of the symmetric difference between these two regions is 0:

print(blocks.unary_union.symmetric_difference(bgs.unary_union).area)

Disaggregating CVAP from bgs -> blocks:

cvap_cols = ["CVAP", "WCVAP", "BCVAP", "HCVAP", "ASIANCVAP"]
assignment = maup.assign(blocks, bgs)
weights = blocks["VAP"] / assignment.map(bgs["VAP"])
prorated = maup.prorate(assignment, bgs[cvap_cols], weights)
blocks[cvap_cols] = prorated

The issue is that the following should be True, but it isn't:

math.isclose(blocks["CVAP"].sum(), bgs["CVAP"].sum())
InnovativeInventor commented 3 years ago

Fixed by #34