jm2652 / Oakland_Crime

Apache License 2.0
0 stars 0 forks source link

Incorrect race demographic computation after 2013 switch to block groups #1

Closed jm2652 closed 5 months ago

jm2652 commented 7 months ago

In file Add_Demographics, computed demographic values for race in some neighborhoods and police betas are significantly lower after 2013 than before 2013. Neighborhoods such as Pill Hill have lower demographic values for all races by a factor of ~3. Some, e.g. Acorn, appear unaffected. Variables Pop, Age, and Income appear unaffected.

Between 2012 and 2013 the census file switches from using tracts as its zone unit to using block groups, which are smaller. Each tract is composed of block groups which approximately sum to the tract number. For example:

YEAR GEOID ESTIMATE 2012 06001401400 316 2013 060014014001 286 2013 060014014003 97

The problem appears to stem from computing means instead of sums across areas that are now decomposed into lower values. For example: mean(6,8) = 7 whereas mean(3,3, 4,4) = 3.5

The most promising solution is probably to compute percentages for each race variable instead of totals. Or re-aggregating back into tracts and then taking the mean.

jm2652 commented 5 months ago

Addressed by computing percentages instead of total number of residents by race, which solves the unit size problem while not sacrificing function.