walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
640 stars 99 forks source link

Error for 2000 decennial census sf3 call #353

Closed yeonhwalee closed 3 years ago

yeonhwalee commented 3 years ago

Hello,

I am trying to execute the following code to grab the median income by core-based statistical area for 2000:

msa_medinc_00 <- get_decennial(
  geography = "cbsa",
  variables = c(med_hh_inc = "P053001"),
  year = 2000,
  sumfile = "sf3",
  output = "wide",
  show_call = TRUE,
  geometry = TRUE) 

I get the following:

Getting data from the 2000 decennial Census
Census API call: https://api.census.gov/data/2000/sf3?get=P053001%2CNAME&for=state%3A%2A
Error : Your API call has errors.  The API message returned is <html><head><title>Error report</title></head><body><h1>HTTP Status 404 - /data/2000/sf3</h1></body></html>.
Error in UseMethod("select_") : 
  no applicable method for 'select_' applied to an object of class "character"

I have another call that asks for a different geography (census tract) and multiple variables, and it returns the same error. I have read through #293 and #343 and installed the GitHub version of tidycensus. Could you please let me know if it is an issue on my part?

Thank you very much for your help!

mfherman commented 3 years ago

Hi @yeonhwalee -- there have been a number of changes to the 2000 decennial API recently. Right now, it looks like the metro area geography available for 2000 via the API is consolidated metropolitan statistical area. You can view all the available geographies for 2000 here: https://api.census.gov/data/2000/dec/sf3/geography.html.

library(tidycensus)

get_decennial(
  geography = "consolidated metropolitan statistical area",
  variables = c(med_hh_inc = "P053001"),
  year = 2000,
  sumfile = "sf3"
  )
#> Getting data from the 2000 decennial Census
#> Using Census Summary File 3
#> # A tibble: 280 x 4
#>    GEOID NAME                                       variable   value
#>    <chr> <chr>                                      <chr>      <dbl>
#>  1 5280  Muncie, IN MSA                             med_hh_inc 34659
#>  2 7720  Sioux City, IA--NE MSA                     med_hh_inc 38563
#>  3 8240  Tallahassee, FL MSA                        med_hh_inc 36441
#>  4 5520  New London--Norwich, CT--RI MSA            med_hh_inc 49283
#>  5 5560  New Orleans, LA MSA                        med_hh_inc 35317
#>  6 7480  Santa Barbara--Santa Maria--Lompoc, CA MSA med_hh_inc 46677
#>  7 5360  Nashville, TN MSA                          med_hh_inc 44223
#>  8 6280  Pittsburgh, PA MSA                         med_hh_inc 37467
#>  9 6320  Pittsfield, MA MSA                         med_hh_inc 38515
#> 10 7362  San Francisco--Oakland--San Jose, CA CMSA  med_hh_inc 62024
#> # ... with 270 more rows

Created on 2021-03-11 by the reprex package (v1.0.0)

Do you have any examples of tracts or other geographies failing?

yeonhwalee commented 3 years ago

Thank you so much @mfherman ! That works for me as well.

Here is an example of tracts failing:

states <- c("Pennsylvania", "New Jersey", "New York")

get_income_00 <- function(state){
  get_decennial(geography = "tract", 
          variables = c(total_hh = "P052001",
                        less_10k = "P052002",
                        bt_10k_15k = "P052003",
                        bt_15k_20k = "P052004",
                        bt_20k_25k = "P052005",
                        bt_25k_30k = "P052006",
                        bt_30k_35k = "P052007",
                        bt_35k_40k = "P052008",
                        bt_40k_45k = "P052009",
                        bt_45k_50k = "P052010",
                        bt_50k_60k = "P052011",
                        bt_60k_75k = "P052012",
                        bt_75k_100k = "P052013",
                        bt_100k_105k = "P052014",
                        bt_105k_150k = "P052015",
                        bt_150k_200k = "P052016",
                        more_200k = "P052017",
                        grp_qrters = "P095001",
                        total_pop = "P001001"),
          year = 2000,
          sumfile = "sf3",
          state = state, 
          output = "wide",
          geometry = TRUE)
}

income_00 <- map(states, get_income_00) %>% 
  map(st_cast, "MULTIPOLYGON") %>%
  map_dfr(c) %>%
  st_sf()

I get the following error (it's not exactly the same error as above):

Getting data from the 2000 decennial Census
Using FIPS code '42' for state 'Pennsylvania'
Error : Your API call has errors.  The API message returned is There was an error while running your query.  We've logged the error and we'll correct it ASAP.  Sorry for the inconvenience..
Variables not found in Summary File 1. Trying Summary File 3...
 Show Traceback

 Rerun with Debug
 Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "character"

I am able to run the code without error if I take out either "grp_qrters = "P095001" or "total_pop = "P001001", but not with both in. All variables were located from checking v00_dec <- load_variables(2000, "sf3", cache = TRUE).

Thank you very much!

mfherman commented 3 years ago

So this looks like a Census API issue to me. There are a number of examples in #343 of combinations of variables causing the API call to fail. As @walkerke notes https://github.com/walkerke/tidycensus/issues/343#issuecomment-781428751, one workaround is to iterate over each variable and make separate API calls. Or since you've identified which variable(s) are causing the error, you could just write two separate calls to get_decennial() and combine the results. Unfortunately, I don't think there is much we can do on the tidycensus end to deal with this error.

yeonhwalee commented 3 years ago

Thank you!