johnrharley / alaskaSVI

Experimental calculations of CDC Social Vulnerability Indices for geometries in Alaska
1 stars 2 forks source link

SVI Data for Multiple Geographies

John Harley 5/6/2020

Background

Social vulnerability is the measure of a state, region, or communities vulnerability to disease, environmental disaster, or other stressors on human health. The U.S. Centers for Disease Control and Prevention (CDC) calculates a Social Vulnerability Index (SVI) based on U.S. census variables.

Variables that are included in the SVI calculations are downloaded from the American Community Survey (ACS) which is a program run by the Census Bureau providing detailed population and housing data. The ACS conducts yearly surveys (ACS 1-year) for most geogrpahies, but data for smaller geographies (\<65,000 people) data are summarized in 5-year reports (ACS 5-year). The most recently published ACS 5-year report was data aggregated from 2014-2018, published in December of 2019.

Geographies

Throughout the document I will be refering to various census geographies, a hierarchy of which for the ACS is shown here (from census.gov).

**ACS Geographies** ![ACS Geogrpahies](https://mcdc.missouri.edu/geography/sumlevs/censusgeochart.png)

The CDC calculates the SVI for a number of geographies down to census tract (usually 2,500 - 8,000 people). In Alaska, these data are displayed across census tracts as a Heilth Equity Index (HEI) by the DHSS in an interactive storymap.

I’ve made a table of the various Alaska geographies below for reference.

Variables

The variables that are used in the SVI calculation fall in to four broad domains (figure from ESRI).

**SVI Variables** ![SVI](https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/CDC-SVI-Variables.jpg?_=81002)

These values are extracted from ACS tables, and generally converted into a percentage format (i.e. Percent of persons living below poverty). Then percent rankings are derived for each variable and GEOID. So GEOIDs with higher percentage of people living in poverty will have a ranking closer to 1. Aggregate domain and overall rankings are calculated and flags are given to GEOIDs in the 90th percentile (most vulnerable). A comprehensive description of the SVI methodology is described here.

Reproducing SVI calculations

The SVI documentation linked above describes in detail the methodology for accessing and calculating each variable estimate and a margin of error (MOE). These calculations generaly reference specific ACS tables which include non-descript variable names such as “S0101_C01_030E”, which is actually the number of persons aged 65 and older. The SVI renames variables to a more interpretable title, the shorthand for that particular variable is called “E_AGE6”.

More complex calculations occur, such as the number of minority persons which is:

(Minority = Total population - White, Non-Hispanic)

All of these calculations are well documented and are reproduced in the .csv file in the data folder called SVI_Variables. Formulae for calculating variables and MOEs are provivided in respective columns in formats that can be interpreted by R using the parse_exprs() command from rlang.

In this way we can calculate SVIs for multiple geographies, including some not done by the CDC. In this repo you’ll find a set of functions which are designed to extract data from the ACS tables and calculate SVIs for a given state and geomerty. These functions are split into several steps including accessing the raw estimates (get_SVI_values()), accessing and calculating percentages (get_SVI_percentages()), and then calculating percent rankings and flags (SVI_rankings() and SVI_flags()).

For convenience, the workflow is also wrapped in the function get_all_SVI() to which you can pass a state and geometry. This function returns a table of 114 variables including the GEOID (unique ID for each region) the name and the SVI variables. A small sample of which is shown below.

AlaskaTract <- get_all_SVI(state = "AK", geography = "tract") 

head(AlaskaTract) %>%
  knitr::kable(., caption="Output from get_all_SVI()") %>%
  kable_styling(latex_options = "striped", position="center", full_width = FALSE)
Output from get\_all\_SVI()
GEOID NAME E\_AGE17 M\_AGE17 E\_POV M\_POV E\_PCI M\_PCI E\_NOHSDP M\_NOHSDP E\_GROUPQ M\_GROUPQ E\_AGE65 M\_AGE65 E\_TOTPOP M\_TOTPOP E\_DISABL M\_DISABL E\_UNEMP M\_UNEMP E\_HU M\_HU E\_HH M\_HH E\_MOBILE M\_MOBILE E\_NOVEH M\_NOVEH E\_SNGPNT E\_MINRTY E\_LIMENG E\_MUNIT E\_CROWD M\_SNGPNT M\_MINRTY M\_LIMENG M\_MUNIT M\_CROWD EP\_PCI MP\_PCI EP\_POV MP\_POV EP\_NOHSDP MP\_NOHSDP EP\_AGE65 MP\_AGE65 EP\_UNEMP MP\_UNEMP EP\_DISABL MP\_DISABL EP\_MOBILE MP\_MOBILE EP\_NOVEH MP\_NOVEH EP\_AGE17 EP\_SNGPNT EP\_MINRTY EP\_LIMENG EP\_MUNIT EP\_CROWD EP\_GROUPQ MP\_AGE17 MP\_SNGPNT MP\_MINRTY MP\_LIMENG MP\_MUNIT MP\_CROWD MP\_GROUPQ POP\_GROUP EPL\_PCI EPL\_POV EPL\_NOHSDP EPL\_AGE65 EPL\_UNEMP EPL\_DISABL EPL\_MOBILE EPL\_NOVEH EPL\_AGE17 EPL\_SNGPNT EPL\_MINRTY EPL\_LIMENG EPL\_MUNIT EPL\_CROWD EPL\_GROUPQ SPL\_THEME1 RPL\_THEME1 SPL\_THEME2 RPL\_THEME2 SPL\_THEME3 RPL\_THEME3 SPL\_THEME4 RPL\_THEME4 SPL\_THEMES RPL\_THEMES F\_PCI F\_POV F\_NOHSDP F\_AGE65 F\_UNEMP F\_DISABL F\_MOBILE F\_NOVEH F\_AGE17 F\_SNGPNT F\_MINRTY F\_LIMENG F\_MUNIT F\_CROWD F\_GROUPQ F\_THEME1 F\_THEME2 F\_THEME3 F\_THEME4 F\_THEMES
02013000100 Census Tract 1, Aleutians East Borough, Alaska 488 78 525 95 32510 2080 358 83 1189 318 302 54 3425 NA 291 54 87 27 1106 142 860 141 48 15 198 39 121 2910 434 42 22 25.49510 3426.0546 116.24113 16.64332 15.00000 32510 2080 15.4 2.8 13.6 3.2 8.8 1.6 3.5 1.2 8.5 1.6 4.3 1.3 23.0 2.7 14.24818 14.069767 84.96350 13.0841121 3.797468 2.5581395 34.7153285 NA NA 3.5031265 1.423647 1.423647 1.693008 NA More than 1,000 persons 0.5870968 0.7612903 0.8967742 0.3032258 0.1483871 0.1677419 0.5974026 0.8766234 0.0387097 0.8129032 0.9290323 0.9935484 0.5290323 0.2193548 0.9935484 310.5935 0 309.9935 0 155.0129 0 382.4475 0 180655.418 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 2 1 3
02016000100 Census Tract 1, Aleutians West Census Area, Alaska 175 38 131 45 35418 7374 47 14 252 127 97 17 969 132 138 26 18 12 792 47 218 44 0 9 79 17 27 766 19 23 12 13.03840 973.3576 33.67492 12.72792 7.28011 35418 7374 16.6 5.4 7.2 2.6 10.0 2.2 3.1 2.4 17.4 4.1 0.0 2.5 36.2 6.6 18.05986 12.385321 79.05057 2.0585049 2.904040 5.5045872 26.0061920 3.053896 99.87082 3.6370589 1.597794 1.597794 3.149270 12.6184262 100-1,000 persons 0.2500000 0.3750000 0.3750000 0.3750000 0.0000000 0.6250000 0.0000000 0.8750000 0.2500000 0.7500000 0.7500000 0.7500000 0.6250000 0.5000000 0.7500000 17.7500 0 17.6250 0 8.6250 0 22.1250 0 595.125 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02016000200 Census Tract 2, Aleutians West Census Area, Alaska 788 90 341 85 37404 1996 412 80 1118 408 237 40 4781 132 290 72 61 23 1175 111 958 111 16 18 114 26 92 3595 438 304 113 24.69818 4781.8851 96.33795 35.51056 23.34524 37404 1996 7.2 1.9 11.9 2.4 5.0 0.8 1.8 0.7 6.1 1.5 1.4 1.6 11.9 2.2 16.48191 9.603340 75.19347 9.5487247 25.872340 11.7954071 23.3842292 1.826622 99.99696 2.0807854 1.777602 1.777602 2.017548 8.5093223 More than 1,000 persons 0.3354839 0.3161290 0.8258065 0.0516129 0.0451613 0.0580645 0.3571429 0.7467532 0.0838710 0.5870968 0.8903226 0.9870968 0.9225806 0.8645161 0.9741935 310.5935 0 309.9935 0 155.0129 0 382.4475 0 180655.418 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 2 3
02020000101 Census Tract 1.01, Anchorage Municipality, Alaska 1633 279 632 349 40939 4947 43 34 141 136 449 136 5709 443 483 157 279 145 2116 59 1927 110 44 40 26 35 177 1065 4 26 17 121.03718 5725.3969 50.98039 44.92215 23.76973 40939 4947 11.1 5.9 1.1 0.9 7.9 2.5 9.6 4.7 8.6 2.8 2.1 1.9 1.3 1.8 28.60396 9.185262 18.65476 0.0751456 1.228733 0.8822003 2.4697846 4.353901 100.27676 0.9577181 2.122699 2.122699 1.232481 2.3744820 More than 1,000 persons 0.1935484 0.5870968 0.0322581 0.2193548 0.7161290 0.1870968 0.4480519 0.1363636 0.7677419 0.5548387 0.1677419 0.1548387 0.3741935 0.0774194 0.6516129 310.5935 0 309.9935 0 155.0129 0 382.4475 0 180655.418 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02020000102 Census Tract 1.02, Anchorage Municipality, Alaska 1324 312 431 380 40244 6884 106 78 18 14 354 114 5439 561 287 122 120 91 2134 98 1929 152 226 121 7 14 176 1177 47 176 56 116.66190 5473.7717 96.36389 83.52844 73.35530 40244 6884 7.9 7.0 2.9 2.0 6.5 2.1 3.9 3.0 5.4 2.4 10.6 5.6 0.4 0.7 24.34271 9.123898 21.64001 0.9268389 8.247423 2.9030586 0.3309432 5.157670 100.61455 1.8981475 3.895805 3.895805 3.795877 0.2551268 More than 1,000 persons 0.2387097 0.3870968 0.1290323 0.0903226 0.2000000 0.0322581 0.8766234 0.0194805 0.4838710 0.5483871 0.2580645 0.5354839 0.6903226 0.2838710 0.2645161 310.5935 0 309.9935 0 155.0129 0 382.4475 0 180655.418 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02020000201 Census Tract 2.01, Anchorage Municipality, Alaska 1556 206 438 243 37971 4566 102 57 0 9 236 90 4780 341 259 109 113 54 1783 51 1654 93 0 9 49 37 175 1294 33 29 46 90.37699 4790.5661 64.27286 27.51363 51.86521 37971 4566 9.2 4.9 3.8 2.1 4.9 1.8 4.7 2.2 5.7 2.3 0.0 1.1 3.0 2.2 32.55230 10.580411 27.07113 0.7744661 1.626472 2.7811366 0.0000000 3.630431 100.20244 1.5070255 1.542407 1.542407 3.131843 0.1882845 More than 1,000 persons 0.2967742 0.4838710 0.2258065 0.0451613 0.2774194 0.0451613 0.0000000 0.2987013 0.9290323 0.6709677 0.3870968 0.4967742 0.4193548 0.2580645 0.0000000 310.5935 0 309.9935 0 155.0129 0 382.4475 0 180655.418 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1

To ensure that our calculations are working properly, we can compare our calculated data with data provided by the CDC (and hosted by the DHSS). The CDC data is also included in the data folder as Alaska_TractsSVI. Diagram We went through several verifications to ensure that our recreated calculations were giving us the same values as the CDC calculations for the geographies that the CDC provides; county and census tract.

In the case below we’ll compare our calculated estimates for total population (prefix:calculated) for each Alaska census tract to the data directly from CDC.

CDC_Data <- read.csv("./data/Alaska_TractsSVI.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")
CDC_Data %>%
  mutate(GEOID = paste0("0",as.character(FIPS))) -> CDC_Data

AlaskaTract %>%
  rename_if(is.numeric, function(x) paste0("calculated",x)) %>%
  inner_join(CDC_Data, by=c("GEOID")) -> mergedData

ggplot(mergedData, aes(x=E_TOTPOP, y=calculatedE_TOTPOP)) +
  geom_point() +
  geom_abline(slope=1, linetype="dashed", color="red") +
  theme_minimal() +
  labs(title="Comparison of two data sources",subtitle = "Comparing estimates of total population") +
  annotate(x=3000, y=9000, geom="text", label="red line represents 1:1 correspondance")

As you can see the values from the CDC are exactly the same as the values we’ve obtained from our calculations. This is also true for the other variables.

We’re confident now that we’ve replicated the CDC’s calculations for the traditional SVI geographies of county and census tract. Using these calculations, we can also calculate the SVI values for the census “place” designation, since all of the ACS tables are available at place level.

Place Level Data

The second map may look a bit strange. Most of it isn’t filled in. Census places and census designated places (CDPs) don’t fill with empty space so that all their borders neatly touch, rather they reflect either city boundaries or somewhat arbitrary lines drawn around where people actually live (CDP). The emptiness of this map reflects the fact that for large swaths of the state of Alaska there aren’t any permanent resitdents.

The value of the place designation can be seen if we zoom in to a sparsely populated region, say the Yukon Koyukuk census area.

By looking at census tract data, the only detail we can glean from this region is that there are approximately 239 persons over the age of 65. However by examining the census place level, we can learn more about how those people are distributed within this region.

The place level does not, however give us more resolution into large populated areas. In fact the entire city of Anchorage is considered one census place since it is based on city boundaries. To better disect populated areas we need to get down to census block group level.

Block Groups

The block group is the smallest unit for which the census and ACS publish summary data. Block groups are quite small, averaging approximately 1,300 people. In Alaska there are 534 census block groups.

Calculations

At the block group level the Data Profile and the Subject tables are no longer available to us. Since some of the SVI equations rely on those tables, we cannot precisely follow the equations outlined by the CDC. However we can access many of these variables through the detailed tables, it just takes a bit of hunting and aggregating.

I’ve created equations to calculate SVIs for Block Groups. These equations and descriptions are in the SVI_BlockGroup file. In almost every case, these equations result in the exact same data as what the CDC generates when have the new algorithms calculate SVIs for tract level geographies.

CDC_Data <- read.csv("./data/Alaska_TractsSVI.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM") %>%
   mutate(GEOID = paste0("0",as.character(FIPS))) 

SVI_BlockGroup <- read.csv("./data/SVI_BlockGroup.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

newTract <- get_SVI_values(SVI_BlockGroup, geography="tract", state="AK")

newTract %>%
  rename_if(is.numeric, function(x) paste0("BlockCalculated",x)) %>%
  inner_join(CDC_Data, by=c("GEOID")) -> mergedData

ggplot(mergedData, aes(x=E_TOTPOP, y=BlockCalculatedE_TOTPOP)) +
  geom_point() +
  geom_abline(slope=1, linetype="dashed", color="red") +
  theme_minimal() +
  labs(title="Comparison of two data sources",subtitle = "CDC data vs. only detailed tables") +
  annotate(x=3000, y=9000, geom="text", label="red line represents 1:1 correspondance")

There is one exception.

Exception 1 - Disability

The CDC’s SVI has a variable that is “Civilian noninstitutionalized population with a disability” or E_DISABL that uses data from a Data Profile table DP02_0071. We can’t access these DP tables at block group level, but we can access disability information in the detailed tables, specifically table C21007 which tabulates age, disability, poverty, and veteran status. By adding up all permutations “with a disability” we can get a good estimate of number of persons with a disability. The only difference is that this table is only for persons for whom poverty status is determined, which evidently misses a few persons: 5,845 persons to be precise, or approximately 9%. What’s important though is the correlation is still very tight r=0.99, meaning it’s still a good variable to include in our SVI calculations.

ggplot(mergedData, aes(x=E_DISABL, y=BlockCalculatedE_DISABL)) +
  geom_point() +
  geom_abline(slope=1, linetype="dashed", color="red") +
  theme_minimal() 

Block Group Data

Now that we know our SVI equations can work even without access to the DP and S tables, we can calculate our SVI for block groups. This is done using the same get_all_SVI() command, only now we use geography="block group".

AlaskaBlocks <- get_SVI_values(SVI_BlockGroup, geography="block group", state="AK")

The get_all_SVI() call by default also calculates percentile ranks, flags, and summaries as well. In this way we can plot overall SVI rankings for each block group with just a few commands.

AlaskaBlocks <- get_all_SVI(geography="block group", state="AK")

BlockShapes <- tigris::block_groups(state="AK")

BlockFort <- fortify(BlockShapes, region="GEOID")

BlockMerged <- left_join(BlockFort, AlaskaBlocks, by=c("id"="GEOID"))

BlockMerged %>%
  mutate(long = ifelse(long > 0, long-360, long)) %>% # need to do this to prevent wrapping around the date line
  ggplot() +
  geom_polygon(aes(x=long, y=lat, group=group, fill=F_THEMES)) +
  theme_void() +
  scale_fill_viridis_b() +
  labs(fill = "Number of SVI flags", title="Census block group level", subtitle = "Alaska")

The block group is again quite useful for looking at large cities, since we get a higher resolution.

AnchorageBlock <- BlockMerged %>%
  filter(EP_AGE65 < 50) %>%
  mutate(long = ifelse(long > 0, long-360, long)) %>% # need to do this to prevent wrapping around the date line
  ggplot() +
  geom_polygon(aes(x=long, y=lat, group=group, fill=EP_AGE65), color="white") +
  theme_void() +
  coord_cartesian(xlim = c(-150, -149.8),ylim = c(61.1, 61.3)) +
  scale_fill_viridis_c() +
  labs(fill = "Percentage persons \n 65 or older", title="Census block group level", subtitle = "Anchorage, n=214 block groups")

AnchorageTract <- TractMerged %>%
  filter(EP_AGE65 < 50) %>%
  mutate(long = ifelse(long > 0, long-360, long)) %>% # need to do this to prevent wrapping around the date line
  ggplot() +
  geom_polygon(aes(x=long, y=lat, group=group, fill=EP_AGE65), color="white") +
  theme_void() +
  coord_cartesian(xlim = c(-150, -149.8),ylim = c(61.1, 61.3)) +
  scale_fill_viridis_c() +
  labs(fill = "Percentage persons \n 65 or older", title="Census tract level", subtitle = "Anchorage, n=55 tracts")

gridExtra::grid.arrange(AnchorageTract, AnchorageBlock, ncol=2)

Disclaimer

I have put in considereable effort to ensure the accuracy of these data following the CDC’s calculations, however it is possible that errors exist. Please let me know if you find or notice errors.

This work is experimental in nature and not intended to inform policy or public health efforts at this time. That being said my hope is that these data can be vetted and incorporated by cities, counties, or states in the future.

Acknowledgements

Funding for this work was provided by a grant from the University of Alaska Fairbanks Center for Innovation, Commercialization, and Entrepeneurship (Center ICE) Immediate Innovation for Coronavirus Project (IICP).