walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
639 stars 100 forks source link

Problem with pulling MSA-level data #494

Closed ericvmai closed 1 year ago

ericvmai commented 1 year ago

Trying to pull down data similar to the following: B25119_PULL <- get_acs( geography = "metropolitan statistical area/micropolitan statistical area", state = "VA", table = "B25119", year = 2021, survey = "acs5" )

Returns the following error for me: Getting data from the 2017-2021 5-year ACS Using FIPS code '51' for state 'VA' Error: Your API call has errors. The API message returned is error: unknown/unsupported geography heirarchy.

Should this not work fine? I can see the data clearly on data.census.gov

szimmer commented 1 year ago

There's 2 things going on making your call wrong. MSAs are not nested within states so a state can't be included in the call. You can later filter for areas that are in Virginia.

Secondly, the name for the area changed in 2021. See: https://www2.census.gov/data/api-documentation/label-changes-to-2021-acs-5-year-geographies.pdf We might want to handle this in the tidycensus package to change the name in the R package before sending to the API.

A reprex to show working code where you can see some areas include multiple states:

library(tidycensus)
library(tidyverse)

B25119_20_PULL <- get_acs( geography = "metropolitan statistical area/micropolitan statistical area", table = "B25119", year = 2020, survey = "acs5" )
#> Getting data from the 2016-2020 5-year ACS
#> Loading ACS5 variables for 2020 from table B25119. To cache this dataset for faster access to ACS tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per ACS dataset.

B25119_21_PULL <- get_acs( geography = "metropolitan/micropolitan statistical area", table = "B25119", year = 2021, survey = "acs5" )
#> Getting data from the 2017-2021 5-year ACS
#> Loading ACS5 variables for 2021 from table B25119. To cache this dataset for faster access to ACS tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per ACS dataset.

B25119_20_PULL %>%
  filter(str_detect(NAME, "VA")) %>%
  distinct(GEOID, NAME)
#> # A tibble: 15 × 2
#>    GEOID NAME                                                   
#>    <chr> <chr>                                                  
#>  1 13720 Big Stone Gap, VA Micro Area                           
#>  2 13980 Blacksburg-Christiansburg, VA Metro Area               
#>  3 14140 Bluefield, WV-VA Micro Area                            
#>  4 16820 Charlottesville, VA Metro Area                         
#>  5 19260 Danville, VA Micro Area                                
#>  6 25500 Harrisonburg, VA Metro Area                            
#>  7 28700 Kingsport-Bristol, TN-VA Metro Area                    
#>  8 31340 Lynchburg, VA Metro Area                               
#>  9 32300 Martinsville, VA Micro Area                            
#> 10 40060 Richmond, VA Metro Area                                
#> 11 40220 Roanoke, VA Metro Area                                 
#> 12 44420 Staunton, VA Metro Area                                
#> 13 47260 Virginia Beach-Norfolk-Newport News, VA-NC Metro Area  
#> 14 47900 Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area
#> 15 49020 Winchester, VA-WV Metro Area

B25119_21_PULL %>%
  filter(str_detect(NAME, "VA")) %>%
  distinct(GEOID, NAME)
#> # A tibble: 15 × 2
#>    GEOID NAME                                                   
#>    <chr> <chr>                                                  
#>  1 13720 Big Stone Gap, VA Micro Area                           
#>  2 13980 Blacksburg-Christiansburg, VA Metro Area               
#>  3 14140 Bluefield, WV-VA Micro Area                            
#>  4 16820 Charlottesville, VA Metro Area                         
#>  5 19260 Danville, VA Micro Area                                
#>  6 25500 Harrisonburg, VA Metro Area                            
#>  7 28700 Kingsport-Bristol, TN-VA Metro Area                    
#>  8 31340 Lynchburg, VA Metro Area                               
#>  9 32300 Martinsville, VA Micro Area                            
#> 10 40060 Richmond, VA Metro Area                                
#> 11 40220 Roanoke, VA Metro Area                                 
#> 12 44420 Staunton, VA Metro Area                                
#> 13 47260 Virginia Beach-Norfolk-Newport News, VA-NC Metro Area  
#> 14 47900 Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area
#> 15 49020 Winchester, VA-WV Metro Area

Created on 2022-12-28 with reprex v2.0.2

ericvmai commented 1 year ago

Thanks @szimmer! I was under the impression that you could also filter by state based on the "Available by" column at this page: https://walker-data.com/tidycensus/articles/basic-usage.html

But perhaps I was reading that incorrectly. Thank you though!

walkerke commented 1 year ago

Hi @ericvmai and thanks @szimmer!

I'll push a fix to remove state from "Available by" in the docs, @szimmer is correct. In the GitHub version of tidycensus, specifying geography = "cbsa" will translate to the correct year-specific notation for MSAs (see https://github.com/walkerke/tidycensus/issues/490). Though we probably should translate "metropolitan statistical area/micropolitan statistical area" to the new 2021 version as well.

I'll push those corrections and submit to CRAN when submissions re-open after their winter break.