Closed gregmacfarlane closed 4 years ago
I have Connor working on a block group-to-park boundary network distance calculator.
The calculator is complete as of 2be41f858dda142a5aa091a47a7a098fb131a829.
First, we calculate the nearest node in the OpenStreetMap network to the population-weighted tract centroid. Then, we calculate the Euclidean distance between each park and each park point. We then find the nearest node to that Euclidean points, and calculate the shortest network distance between the two nodes.
There appears to have been an issue in calculating the shortest paths to Staten Island. The plot below shows the distance between tract centroids and parks calculated by Euclidean and Network distance, facetted by park borough ("B"rooklyn, "M"anhattan, "Q"ueens, "R" for Staten Island, and Bron"X") across the top. Cemeteries get their own category "C". The tract borough is listed by FIPS code going down. Staten Island is 085
.
It is clear that something is wrong with the OpenStreetMap network in Staten Island. Almost every exchange between park and tract appears to have a shorter network distance than a Euclidean distance, which violates several laws of geometry.
So, in cases where the network distance was shorter than the Euclidean distance, we used the Euclidean distance instead. This only appears to affect tracts in Staten Island. The resulting accessibility map for network distances is below:
And the map for Euclidean distances for comparison:
Even here, you can see that the most noticeable difference is for tracts in Staten Island.
Does this make a difference? The model results suggest that the network-based calculation results in a somewhat more conservative estimate of the effect of park accessibility on physical activity, but the two estimates are extremely similar and the models as a whole are effectively identical. The Euclidean distance models have a higher likelihood, and the network distance is kind of a Frankenstein metric because of what we saw earlier. So let's stick with Euclidean distance.
======================================================================================
Access Euclidean Access Network Multi Euclidean Multi Network
--------------------------------------------------------------------------------------
(Intercept) -24.8266 ** -24.6491 ** -24.8920 ** -24.7534 **
(8.6317) (8.6228) (8.6457) (8.6366)
log(density) 0.1766 * 0.1716 * 0.1757 * 0.1715 *
(0.0710) (0.0709) (0.0710) (0.0710)
log(income) 5.9976 *** 5.9801 *** 5.9964 *** 5.9795 ***
(0.2163) (0.2160) (0.2164) (0.2161)
fulltime 0.1275 *** 0.1280 *** 0.1275 *** 0.1280 ***
(0.0094) (0.0094) (0.0094) (0.0094)
college 0.0076 0.0072 0.0073 0.0070
(0.0119) (0.0119) (0.0119) (0.0119)
single -0.0362 *** -0.0362 *** -0.0360 *** -0.0361 ***
(0.0084) (0.0084) (0.0084) (0.0084)
youth -0.1311 *** -0.1312 *** -0.1310 *** -0.1311 ***
(0.0135) (0.0135) (0.0135) (0.0135)
young_adults 0.0293 ** 0.0288 ** 0.0294 ** 0.0289 **
(0.0103) (0.0103) (0.0103) (0.0103)
seniors 0.0403 ** 0.0413 ** 0.0408 ** 0.0417 **
(0.0140) (0.0139) (0.0140) (0.0139)
black -0.0499 *** -0.0499 *** -0.0498 *** -0.0498 ***
(0.0045) (0.0045) (0.0045) (0.0045)
asian -0.0767 *** -0.0767 *** -0.0768 *** -0.0768 ***
(0.0054) (0.0054) (0.0054) (0.0054)
hispanic -0.1025 *** -0.1025 *** -0.1025 *** -0.1025 ***
(0.0047) (0.0046) (0.0047) (0.0047)
other 0.0035 0.0025 0.0037 0.0027
(0.0452) (0.0452) (0.0452) (0.0452)
log(lag.density) 1.0987 *** 1.1053 *** 1.1030 *** 1.1072 ***
(0.1954) (0.1952) (0.1956) (0.1954)
log(lag.income) 1.9646 *** 1.9557 *** 1.9570 *** 1.9532 ***
(0.5243) (0.5238) (0.5249) (0.5245)
lag.fulltime -0.0396 -0.0378 -0.0388 -0.0373
(0.0241) (0.0241) (0.0241) (0.0241)
lag.college -0.0711 * -0.0718 * -0.0721 * -0.0726 *
(0.0304) (0.0303) (0.0304) (0.0304)
lag.single -0.0085 -0.0088 -0.0077 -0.0080
(0.0215) (0.0215) (0.0215) (0.0215)
lag.youth -0.0111 -0.0109 -0.0101 -0.0099
(0.0352) (0.0352) (0.0353) (0.0353)
lag.young_adults 0.0842 ** 0.0832 ** 0.0848 ** 0.0840 **
(0.0259) (0.0258) (0.0259) (0.0259)
lag.seniors 0.0589 0.0600 0.0603 0.0611
(0.0353) (0.0353) (0.0353) (0.0353)
lag.black 0.0011 0.0012 0.0008 0.0010
(0.0082) (0.0082) (0.0082) (0.0082)
lag.asian -0.0398 *** -0.0401 *** -0.0401 *** -0.0403 ***
(0.0095) (0.0095) (0.0095) (0.0095)
lag.hispanic -0.0009 0.0000 -0.0008 0.0000
(0.0088) (0.0087) (0.0088) (0.0088)
lag.other 0.0715 0.0713 0.0710 0.0706
(0.1312) (0.1312) (0.1313) (0.1312)
euc_access 0.1610 *
(0.0700)
lambda 0.6906 *** 0.6898 *** 0.6917 *** 0.6909 ***
(0.0200) (0.0200) (0.0199) (0.0199)
net_access 0.1453 *
(0.0621)
euc_multi 0.1472 *
(0.0722)
net_multi 0.1352 *
(0.0649)
--------------------------------------------------------------------------------------
Num. obs. 2099 2099 2099 2099
Parameters 28 28 28 28
Log Likelihood -4435.5066 -4435.4363 -4436.0465 -4435.9701
AIC (Linear model) 9717.9489 9699.7602 9725.1486 9707.9035
AIC (Spatial model) 8927.0133 8926.8725 8928.0930 8927.9403
LR test: statistic 792.9356 774.8877 799.0556 781.9633
LR test: p-value 0.0000 0.0000 0.0000 0.0000
======================================================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
For more context, the "Multi" models use more terms in the utility equation besides park size and distance.
TLDR; the difference between network and Euclidean distance in tract-level accessibility is negligible. We will stick with Euclidean distance for simplicity.
Reviewer 2