John Harley 5/6/2020
Social vulnerability is the measure of a state, region, or communities vulnerability to disease, environmental disaster, or other stressors on human health. The U.S. Centers for Disease Control and Prevention (CDC) calculates a Social Vulnerability Index (SVI) based on U.S. census variables.
Variables that are included in the SVI calculations are downloaded from the American Community Survey (ACS) which is a program run by the Census Bureau providing detailed population and housing data. The ACS conducts yearly surveys (ACS 1-year) for most geogrpahies, but data for smaller geographies (\<65,000 people) data are summarized in 5-year reports (ACS 5-year). The most recently published ACS 5-year report was data aggregated from 2014-2018, published in December of 2019.
Throughout the document I will be refering to various census geographies, a hierarchy of which for the ACS is shown here (from census.gov).
The CDC calculates the SVI for a number of geographies down to census tract (usually 2,500 - 8,000 people). In Alaska, these data are displayed across census tracts as a Heilth Equity Index (HEI) by the DHSS in an interactive storymap.
I’ve made a table of the various Alaska geographies below for reference.
The variables that are used in the SVI calculation fall in to four broad domains (figure from ESRI).
These values are extracted from ACS tables, and generally converted into a percentage format (i.e. Percent of persons living below poverty). Then percent rankings are derived for each variable and GEOID. So GEOIDs with higher percentage of people living in poverty will have a ranking closer to 1. Aggregate domain and overall rankings are calculated and flags are given to GEOIDs in the 90th percentile (most vulnerable). A comprehensive description of the SVI methodology is described here.
The SVI documentation linked above describes in detail the methodology for accessing and calculating each variable estimate and a margin of error (MOE). These calculations generaly reference specific ACS tables which include non-descript variable names such as “S0101_C01_030E”, which is actually the number of persons aged 65 and older. The SVI renames variables to a more interpretable title, the shorthand for that particular variable is called “E_AGE6”.
More complex calculations occur, such as the number of minority persons which is:
(Minority = Total population - White, Non-Hispanic)
All of these calculations are well documented and are reproduced in the
.csv file in the data folder called
SVI_Variables.
Formulae for calculating variables and MOEs are provivided in respective
columns in formats that can be interpreted by R using the
parse_exprs()
command from rlang
.
In this way we can calculate SVIs for multiple geographies, including
some not done by the CDC. In this repo you’ll find a set of functions
which are designed to extract data from the ACS tables and calculate
SVIs for a given state and geomerty. These functions are split into
several steps including accessing the raw estimates
(get_SVI_values()
), accessing and calculating percentages
(get_SVI_percentages()
), and then calculating percent rankings and
flags (SVI_rankings()
and SVI_flags()
).
For convenience, the workflow is also wrapped in the function
get_all_SVI()
to which you can pass a state and geometry. This
function returns a table of 114 variables including the GEOID (unique ID
for each region) the name and the SVI variables. A small sample of which
is shown below.
AlaskaTract <- get_all_SVI(state = "AK", geography = "tract")
head(AlaskaTract) %>%
knitr::kable(., caption="Output from get_all_SVI()") %>%
kable_styling(latex_options = "striped", position="center", full_width = FALSE)
GEOID | NAME | E\_AGE17 | M\_AGE17 | E\_POV | M\_POV | E\_PCI | M\_PCI | E\_NOHSDP | M\_NOHSDP | E\_GROUPQ | M\_GROUPQ | E\_AGE65 | M\_AGE65 | E\_TOTPOP | M\_TOTPOP | E\_DISABL | M\_DISABL | E\_UNEMP | M\_UNEMP | E\_HU | M\_HU | E\_HH | M\_HH | E\_MOBILE | M\_MOBILE | E\_NOVEH | M\_NOVEH | E\_SNGPNT | E\_MINRTY | E\_LIMENG | E\_MUNIT | E\_CROWD | M\_SNGPNT | M\_MINRTY | M\_LIMENG | M\_MUNIT | M\_CROWD | EP\_PCI | MP\_PCI | EP\_POV | MP\_POV | EP\_NOHSDP | MP\_NOHSDP | EP\_AGE65 | MP\_AGE65 | EP\_UNEMP | MP\_UNEMP | EP\_DISABL | MP\_DISABL | EP\_MOBILE | MP\_MOBILE | EP\_NOVEH | MP\_NOVEH | EP\_AGE17 | EP\_SNGPNT | EP\_MINRTY | EP\_LIMENG | EP\_MUNIT | EP\_CROWD | EP\_GROUPQ | MP\_AGE17 | MP\_SNGPNT | MP\_MINRTY | MP\_LIMENG | MP\_MUNIT | MP\_CROWD | MP\_GROUPQ | POP\_GROUP | EPL\_PCI | EPL\_POV | EPL\_NOHSDP | EPL\_AGE65 | EPL\_UNEMP | EPL\_DISABL | EPL\_MOBILE | EPL\_NOVEH | EPL\_AGE17 | EPL\_SNGPNT | EPL\_MINRTY | EPL\_LIMENG | EPL\_MUNIT | EPL\_CROWD | EPL\_GROUPQ | SPL\_THEME1 | RPL\_THEME1 | SPL\_THEME2 | RPL\_THEME2 | SPL\_THEME3 | RPL\_THEME3 | SPL\_THEME4 | RPL\_THEME4 | SPL\_THEMES | RPL\_THEMES | F\_PCI | F\_POV | F\_NOHSDP | F\_AGE65 | F\_UNEMP | F\_DISABL | F\_MOBILE | F\_NOVEH | F\_AGE17 | F\_SNGPNT | F\_MINRTY | F\_LIMENG | F\_MUNIT | F\_CROWD | F\_GROUPQ | F\_THEME1 | F\_THEME2 | F\_THEME3 | F\_THEME4 | F\_THEMES |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
02013000100 | Census Tract 1, Aleutians East Borough, Alaska | 488 | 78 | 525 | 95 | 32510 | 2080 | 358 | 83 | 1189 | 318 | 302 | 54 | 3425 | NA | 291 | 54 | 87 | 27 | 1106 | 142 | 860 | 141 | 48 | 15 | 198 | 39 | 121 | 2910 | 434 | 42 | 22 | 25.49510 | 3426.0546 | 116.24113 | 16.64332 | 15.00000 | 32510 | 2080 | 15.4 | 2.8 | 13.6 | 3.2 | 8.8 | 1.6 | 3.5 | 1.2 | 8.5 | 1.6 | 4.3 | 1.3 | 23.0 | 2.7 | 14.24818 | 14.069767 | 84.96350 | 13.0841121 | 3.797468 | 2.5581395 | 34.7153285 | NA | NA | 3.5031265 | 1.423647 | 1.423647 | 1.693008 | NA | More than 1,000 persons | 0.5870968 | 0.7612903 | 0.8967742 | 0.3032258 | 0.1483871 | 0.1677419 | 0.5974026 | 0.8766234 | 0.0387097 | 0.8129032 | 0.9290323 | 0.9935484 | 0.5290323 | 0.2193548 | 0.9935484 | 310.5935 | 0 | 309.9935 | 0 | 155.0129 | 0 | 382.4475 | 0 | 180655.418 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 2 | 1 | 3 |
02016000100 | Census Tract 1, Aleutians West Census Area, Alaska | 175 | 38 | 131 | 45 | 35418 | 7374 | 47 | 14 | 252 | 127 | 97 | 17 | 969 | 132 | 138 | 26 | 18 | 12 | 792 | 47 | 218 | 44 | 0 | 9 | 79 | 17 | 27 | 766 | 19 | 23 | 12 | 13.03840 | 973.3576 | 33.67492 | 12.72792 | 7.28011 | 35418 | 7374 | 16.6 | 5.4 | 7.2 | 2.6 | 10.0 | 2.2 | 3.1 | 2.4 | 17.4 | 4.1 | 0.0 | 2.5 | 36.2 | 6.6 | 18.05986 | 12.385321 | 79.05057 | 2.0585049 | 2.904040 | 5.5045872 | 26.0061920 | 3.053896 | 99.87082 | 3.6370589 | 1.597794 | 1.597794 | 3.149270 | 12.6184262 | 100-1,000 persons | 0.2500000 | 0.3750000 | 0.3750000 | 0.3750000 | 0.0000000 | 0.6250000 | 0.0000000 | 0.8750000 | 0.2500000 | 0.7500000 | 0.7500000 | 0.7500000 | 0.6250000 | 0.5000000 | 0.7500000 | 17.7500 | 0 | 17.6250 | 0 | 8.6250 | 0 | 22.1250 | 0 | 595.125 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
02016000200 | Census Tract 2, Aleutians West Census Area, Alaska | 788 | 90 | 341 | 85 | 37404 | 1996 | 412 | 80 | 1118 | 408 | 237 | 40 | 4781 | 132 | 290 | 72 | 61 | 23 | 1175 | 111 | 958 | 111 | 16 | 18 | 114 | 26 | 92 | 3595 | 438 | 304 | 113 | 24.69818 | 4781.8851 | 96.33795 | 35.51056 | 23.34524 | 37404 | 1996 | 7.2 | 1.9 | 11.9 | 2.4 | 5.0 | 0.8 | 1.8 | 0.7 | 6.1 | 1.5 | 1.4 | 1.6 | 11.9 | 2.2 | 16.48191 | 9.603340 | 75.19347 | 9.5487247 | 25.872340 | 11.7954071 | 23.3842292 | 1.826622 | 99.99696 | 2.0807854 | 1.777602 | 1.777602 | 2.017548 | 8.5093223 | More than 1,000 persons | 0.3354839 | 0.3161290 | 0.8258065 | 0.0516129 | 0.0451613 | 0.0580645 | 0.3571429 | 0.7467532 | 0.0838710 | 0.5870968 | 0.8903226 | 0.9870968 | 0.9225806 | 0.8645161 | 0.9741935 | 310.5935 | 0 | 309.9935 | 0 | 155.0129 | 0 | 382.4475 | 0 | 180655.418 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 2 | 3 |
02020000101 | Census Tract 1.01, Anchorage Municipality, Alaska | 1633 | 279 | 632 | 349 | 40939 | 4947 | 43 | 34 | 141 | 136 | 449 | 136 | 5709 | 443 | 483 | 157 | 279 | 145 | 2116 | 59 | 1927 | 110 | 44 | 40 | 26 | 35 | 177 | 1065 | 4 | 26 | 17 | 121.03718 | 5725.3969 | 50.98039 | 44.92215 | 23.76973 | 40939 | 4947 | 11.1 | 5.9 | 1.1 | 0.9 | 7.9 | 2.5 | 9.6 | 4.7 | 8.6 | 2.8 | 2.1 | 1.9 | 1.3 | 1.8 | 28.60396 | 9.185262 | 18.65476 | 0.0751456 | 1.228733 | 0.8822003 | 2.4697846 | 4.353901 | 100.27676 | 0.9577181 | 2.122699 | 2.122699 | 1.232481 | 2.3744820 | More than 1,000 persons | 0.1935484 | 0.5870968 | 0.0322581 | 0.2193548 | 0.7161290 | 0.1870968 | 0.4480519 | 0.1363636 | 0.7677419 | 0.5548387 | 0.1677419 | 0.1548387 | 0.3741935 | 0.0774194 | 0.6516129 | 310.5935 | 0 | 309.9935 | 0 | 155.0129 | 0 | 382.4475 | 0 | 180655.418 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
02020000102 | Census Tract 1.02, Anchorage Municipality, Alaska | 1324 | 312 | 431 | 380 | 40244 | 6884 | 106 | 78 | 18 | 14 | 354 | 114 | 5439 | 561 | 287 | 122 | 120 | 91 | 2134 | 98 | 1929 | 152 | 226 | 121 | 7 | 14 | 176 | 1177 | 47 | 176 | 56 | 116.66190 | 5473.7717 | 96.36389 | 83.52844 | 73.35530 | 40244 | 6884 | 7.9 | 7.0 | 2.9 | 2.0 | 6.5 | 2.1 | 3.9 | 3.0 | 5.4 | 2.4 | 10.6 | 5.6 | 0.4 | 0.7 | 24.34271 | 9.123898 | 21.64001 | 0.9268389 | 8.247423 | 2.9030586 | 0.3309432 | 5.157670 | 100.61455 | 1.8981475 | 3.895805 | 3.895805 | 3.795877 | 0.2551268 | More than 1,000 persons | 0.2387097 | 0.3870968 | 0.1290323 | 0.0903226 | 0.2000000 | 0.0322581 | 0.8766234 | 0.0194805 | 0.4838710 | 0.5483871 | 0.2580645 | 0.5354839 | 0.6903226 | 0.2838710 | 0.2645161 | 310.5935 | 0 | 309.9935 | 0 | 155.0129 | 0 | 382.4475 | 0 | 180655.418 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
02020000201 | Census Tract 2.01, Anchorage Municipality, Alaska | 1556 | 206 | 438 | 243 | 37971 | 4566 | 102 | 57 | 0 | 9 | 236 | 90 | 4780 | 341 | 259 | 109 | 113 | 54 | 1783 | 51 | 1654 | 93 | 0 | 9 | 49 | 37 | 175 | 1294 | 33 | 29 | 46 | 90.37699 | 4790.5661 | 64.27286 | 27.51363 | 51.86521 | 37971 | 4566 | 9.2 | 4.9 | 3.8 | 2.1 | 4.9 | 1.8 | 4.7 | 2.2 | 5.7 | 2.3 | 0.0 | 1.1 | 3.0 | 2.2 | 32.55230 | 10.580411 | 27.07113 | 0.7744661 | 1.626472 | 2.7811366 | 0.0000000 | 3.630431 | 100.20244 | 1.5070255 | 1.542407 | 1.542407 | 3.131843 | 0.1882845 | More than 1,000 persons | 0.2967742 | 0.4838710 | 0.2258065 | 0.0451613 | 0.2774194 | 0.0451613 | 0.0000000 | 0.2987013 | 0.9290323 | 0.6709677 | 0.3870968 | 0.4967742 | 0.4193548 | 0.2580645 | 0.0000000 | 310.5935 | 0 | 309.9935 | 0 | 155.0129 | 0 | 382.4475 | 0 | 180655.418 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
To ensure that our calculations are working properly, we can compare our calculated data with data provided by the CDC (and hosted by the DHSS). The CDC data is also included in the data folder as Alaska_TractsSVI. We went through several verifications to ensure that our recreated calculations were giving us the same values as the CDC calculations for the geographies that the CDC provides; county and census tract.
In the case below we’ll compare our calculated estimates for total population (prefix:calculated) for each Alaska census tract to the data directly from CDC.
CDC_Data <- read.csv("./data/Alaska_TractsSVI.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")
CDC_Data %>%
mutate(GEOID = paste0("0",as.character(FIPS))) -> CDC_Data
AlaskaTract %>%
rename_if(is.numeric, function(x) paste0("calculated",x)) %>%
inner_join(CDC_Data, by=c("GEOID")) -> mergedData
ggplot(mergedData, aes(x=E_TOTPOP, y=calculatedE_TOTPOP)) +
geom_point() +
geom_abline(slope=1, linetype="dashed", color="red") +
theme_minimal() +
labs(title="Comparison of two data sources",subtitle = "Comparing estimates of total population") +
annotate(x=3000, y=9000, geom="text", label="red line represents 1:1 correspondance")
As you can see the values from the CDC are exactly the same as the values we’ve obtained from our calculations. This is also true for the other variables.
We’re confident now that we’ve replicated the CDC’s calculations for the traditional SVI geographies of county and census tract. Using these calculations, we can also calculate the SVI values for the census “place” designation, since all of the ACS tables are available at place level.
The second map may look a bit strange. Most of it isn’t filled in. Census places and census designated places (CDPs) don’t fill with empty space so that all their borders neatly touch, rather they reflect either city boundaries or somewhat arbitrary lines drawn around where people actually live (CDP). The emptiness of this map reflects the fact that for large swaths of the state of Alaska there aren’t any permanent resitdents.
The value of the place designation can be seen if we zoom in to a sparsely populated region, say the Yukon Koyukuk census area.
By looking at census tract data, the only detail we can glean from this region is that there are approximately 239 persons over the age of 65. However by examining the census place level, we can learn more about how those people are distributed within this region.
The place level does not, however give us more resolution into large populated areas. In fact the entire city of Anchorage is considered one census place since it is based on city boundaries. To better disect populated areas we need to get down to census block group level.
The block group is the smallest unit for which the census and ACS publish summary data. Block groups are quite small, averaging approximately 1,300 people. In Alaska there are 534 census block groups.
At the block group level the Data Profile and the Subject tables are no longer available to us. Since some of the SVI equations rely on those tables, we cannot precisely follow the equations outlined by the CDC. However we can access many of these variables through the detailed tables, it just takes a bit of hunting and aggregating.
I’ve created equations to calculate SVIs for Block Groups. These equations and descriptions are in the SVI_BlockGroup file. In almost every case, these equations result in the exact same data as what the CDC generates when have the new algorithms calculate SVIs for tract level geographies.
CDC_Data <- read.csv("./data/Alaska_TractsSVI.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM") %>%
mutate(GEOID = paste0("0",as.character(FIPS)))
SVI_BlockGroup <- read.csv("./data/SVI_BlockGroup.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")
newTract <- get_SVI_values(SVI_BlockGroup, geography="tract", state="AK")
newTract %>%
rename_if(is.numeric, function(x) paste0("BlockCalculated",x)) %>%
inner_join(CDC_Data, by=c("GEOID")) -> mergedData
ggplot(mergedData, aes(x=E_TOTPOP, y=BlockCalculatedE_TOTPOP)) +
geom_point() +
geom_abline(slope=1, linetype="dashed", color="red") +
theme_minimal() +
labs(title="Comparison of two data sources",subtitle = "CDC data vs. only detailed tables") +
annotate(x=3000, y=9000, geom="text", label="red line represents 1:1 correspondance")
There is one exception.
The CDC’s SVI has a variable that is “Civilian noninstitutionalized population with a disability” or E_DISABL that uses data from a Data Profile table DP02_0071. We can’t access these DP tables at block group level, but we can access disability information in the detailed tables, specifically table C21007 which tabulates age, disability, poverty, and veteran status. By adding up all permutations “with a disability” we can get a good estimate of number of persons with a disability. The only difference is that this table is only for persons for whom poverty status is determined, which evidently misses a few persons: 5,845 persons to be precise, or approximately 9%. What’s important though is the correlation is still very tight r=0.99, meaning it’s still a good variable to include in our SVI calculations.
ggplot(mergedData, aes(x=E_DISABL, y=BlockCalculatedE_DISABL)) +
geom_point() +
geom_abline(slope=1, linetype="dashed", color="red") +
theme_minimal()
Now that we know our SVI equations can work even without access to the
DP and S tables, we can calculate our SVI for block groups. This is done
using the same get_all_SVI()
command, only now we use
geography="block group"
.
AlaskaBlocks <- get_SVI_values(SVI_BlockGroup, geography="block group", state="AK")
The get_all_SVI()
call by default also calculates percentile ranks,
flags, and summaries as well. In this way we can plot overall SVI
rankings for each block group with just a few commands.
AlaskaBlocks <- get_all_SVI(geography="block group", state="AK")
BlockShapes <- tigris::block_groups(state="AK")
BlockFort <- fortify(BlockShapes, region="GEOID")
BlockMerged <- left_join(BlockFort, AlaskaBlocks, by=c("id"="GEOID"))
BlockMerged %>%
mutate(long = ifelse(long > 0, long-360, long)) %>% # need to do this to prevent wrapping around the date line
ggplot() +
geom_polygon(aes(x=long, y=lat, group=group, fill=F_THEMES)) +
theme_void() +
scale_fill_viridis_b() +
labs(fill = "Number of SVI flags", title="Census block group level", subtitle = "Alaska")
The block group is again quite useful for looking at large cities, since we get a higher resolution.
AnchorageBlock <- BlockMerged %>%
filter(EP_AGE65 < 50) %>%
mutate(long = ifelse(long > 0, long-360, long)) %>% # need to do this to prevent wrapping around the date line
ggplot() +
geom_polygon(aes(x=long, y=lat, group=group, fill=EP_AGE65), color="white") +
theme_void() +
coord_cartesian(xlim = c(-150, -149.8),ylim = c(61.1, 61.3)) +
scale_fill_viridis_c() +
labs(fill = "Percentage persons \n 65 or older", title="Census block group level", subtitle = "Anchorage, n=214 block groups")
AnchorageTract <- TractMerged %>%
filter(EP_AGE65 < 50) %>%
mutate(long = ifelse(long > 0, long-360, long)) %>% # need to do this to prevent wrapping around the date line
ggplot() +
geom_polygon(aes(x=long, y=lat, group=group, fill=EP_AGE65), color="white") +
theme_void() +
coord_cartesian(xlim = c(-150, -149.8),ylim = c(61.1, 61.3)) +
scale_fill_viridis_c() +
labs(fill = "Percentage persons \n 65 or older", title="Census tract level", subtitle = "Anchorage, n=55 tracts")
gridExtra::grid.arrange(AnchorageTract, AnchorageBlock, ncol=2)
I have put in considereable effort to ensure the accuracy of these data following the CDC’s calculations, however it is possible that errors exist. Please let me know if you find or notice errors.
This work is experimental in nature and not intended to inform policy or public health efforts at this time. That being said my hope is that these data can be vetted and incorporated by cities, counties, or states in the future.
Funding for this work was provided by a grant from the University of Alaska Fairbanks Center for Innovation, Commercialization, and Entrepeneurship (Center ICE) Immediate Innovation for Coronavirus Project (IICP).