Creating a Deprivation Index in R using Census estimates

This is an R function to extract census variables and calculate a deprivation index. Census data fetched are American Community Survey (ACS) 5-year estimates. Manually extracting all these estimates from is a time consuming process.

The index is based on methodology by Messer and colleagues. Messer and colleagues identified that the principal component extracted from eight variables (below), calculated from Census estimates, best represent neighborhood-level deprivation.


ndi will extract Census estimates at the tract level, transform the variables, and then perform a Principal Component Analysis by using the arguments State and County. Since this index has been previously validated, the function only extracts one component.


Before using the function, a Census API key is required. You can get one for free here:

To run this function, you’ll need a few of the core tidyverse packages, as well as tidycensus by Kyle Walker and psych by William Revelle.



## # A tibble: 140 x 13
##    Tract     County   State GEOID      NDI pct_poverty pct_noHS pct_FHH pct_mgmt
##    <chr>     <chr>    <chr> <chr>    <dbl>       <dbl>    <dbl>   <dbl>    <dbl>
##  1 Census T~ Onondag~ New ~ 36067~  0.684       0.318    0.262   0.184     0.344
##  2 Census T~ Onondag~ New ~ 36067~  1.36        0.334    0.303   0.301     0.188
##  3 Census T~ Onondag~ New ~ 36067~  2.34        0.578    0.342   0.582     0.222
##  4 Census T~ Onondag~ New ~ 36067~  0.304       0.399    0.134   0.274     0.692
##  5 Census T~ Onondag~ New ~ 36067~  2.14        0.370    0.457   0.560     0.245
##  6 Census T~ Onondag~ New ~ 36067~  1.53        0.508    0.276   0.318     0.201
##  7 Census T~ Onondag~ New ~ 36067~ -0.779       0.0559   0.0448  0         0.490
##  8 Census T~ Onondag~ New ~ 36067~  2.79        0.763    0.333   0.970     0.160
##  9 Census T~ Onondag~ New ~ 36067~  1.60        0.796    0.237   0         0.390
## 10 Census T~ Onondag~ New ~ 36067~ -0.0944      0.0955   0.137   0.0728    0.245
## # ... with 130 more rows, and 4 more variables: pct_crowd <dbl>,
## #   pct_pubassist <dbl>, pct_unempl <dbl>, pct_under30K <dbl>

The output variable NDI is the deprivation index score for each corresponding census tract (CT) in the analysis. Higher index scores represent higher deprivation. These scores can be explored on their own or exported for use in statistical models.

Here is the distribution of deprivation across tracts in Onondaga County, NY

If we categorize census-tracts by Syracuse City CT and County CT, we can see that City tracts tend to have more deprived environments than County tracts.

Thematic Mapping

We can further explore the deprivation index by its spatial distribution.

By mapping deprivation scores, we can see that high levels of deprivation concentrate within the City of Syracuse. However, if we map deprivation for city tracts only, we can still see some variation in scores.

Deprivation Index function for an entire State

By ommiting the county argument, the function will perform the same analysis for the entire state named. Additionally, you can find a New York City index here.

Neighborhood deprivation across New York State

The 5 Boroughs

Additional examples using ndi

Broward County, FL

Virginia Beach, VA

New England

