walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
632 stars 95 forks source link

Could not fetch variables in microdata #292

Closed AlexLi-design closed 3 years ago

AlexLi-design commented 3 years ago

Thanks for the terrific package to get the microdata! I am optimistic that someone could help me to solve my problem. I tried to capture data from the PUMS using the developed "tidycensus" data, but it seems something gets wrong. I could run the functions well, but when I run the codes to fetch the data, the reminder shows: Error: Your API call has errors. The API message returned is Error report

HTTP Status 500 -

. I do not know what happens, but I double-checked the variable names and my codes, and it seems everything is correct. I copy/paste my codes as below:

rm(list = ls()) install.packages("devtools") remotes::install_github("walkerke/tidycensus",force=TRUE) library(tidycensus) library(tidyverse) library(dplyr) census_api_key("my key",install=TRUE,overwrite=TRUE) readRenviron("~/.Renviron") pums_vars_2018 <- pums_variables %>% filter(year == 2018, survey == "acs5") View(pums_vars_2018) var_housing<-c("AGEP","SEX","RAC1P","HISP", "HINCP","SCHL", "DDRS","DEAR","DEYE","DOUT","DPHY", "MAR","NPF","FES", "MV","WORKSTAT","HHT","OCPIP","GRPIP","DIVISION","BLD","TEN","VEH","YBL") ACS<- get_pums(variables=c("AGEP","SEX","RAC1P","HISP", "HINCP","SCHL", "DDRS","DEAR","DEYE","DOUT","DPHY", "MAR","NPF","FES", "MV","WORKSTAT","HHT","OCPIP","GRPIP","DIVISION","BLD","TEN","VEH","YBL"), state="all", year = 2018, survey = "acs5", show_call = TRUE)

Thanks!

mfherman commented 3 years ago

Hi -- I'm not totally sure why this is failing, but my guess is the issue is the API is having trouble pulling the 15M+ records for the for all states. If you need PUMS data for all states, you might consider looping through the states one at a time and building in some error handling methods like tryCatch() or purrr::possibly().

Here is an example with your variables from just one (small!) state:

library(tidycensus)

var_housing <- c(
  "AGEP", "SEX", "RAC1P", "HISP", "HINCP","SCHL",
  "DDRS", "DEAR", "DEYE", "DOUT", "DPHY", "MAR",
  "NPF", "FES", "MV", "WORKSTAT", "HHT", "OCPIP", "GRPIP",
  "DIVISION", "BLD", "TEN", "VEH", "YBL"
  )

pums_data <- get_pums(
  variables = var_housing,
  state = "VT",
  year = 2018,
  survey = "acs5"
  )
#> Getting data from the 2014-2018 5-year ACS Public Use Microdata Sample
pums_data
#> # A tibble: 31,883 x 29
#>    SERIALNO SPORDER  WGTP PWGTP  AGEP  HINCP   NPF OCPIP GRPIP DIVISION ST   
#>    <chr>    <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <chr>    <chr>
#>  1 2014000~ 1           5     6    55  39000     1    26     0 1        50   
#>  2 2014000~ 2           5     7    56  39000     1    26     0 1        50   
#>  3 2014000~ 1          11    12    57 100000     2     7     0 1        50   
#>  4 2014000~ 2          11    13    61 100000     2     7     0 1        50   
#>  5 2014000~ 1           8     8    71  23200     1    25     0 1        50   
#>  6 2014000~ 1          12    12    56  57600     2    11     0 1        50   
#>  7 2014000~ 2          12    13    53  57600     2    11     0 1        50   
#>  8 2014000~ 1           5     5    67  25410     2    18     0 1        50   
#>  9 2014000~ 2           5     5    67  25410     2    18     0 1        50   
#> 10 2014000~ 1           7     7    55  94700     2     0    10 1        50   
#> # ... with 31,873 more rows, and 18 more variables: BLD <chr>, TEN <chr>,
#> #   VEH <chr>, YBL <chr>, FES <chr>, HHT <chr>, MV <chr>, WORKSTAT <chr>,
#> #   DDRS <chr>, DEAR <chr>, DEYE <chr>, DOUT <chr>, DPHY <chr>, MAR <chr>,
#> #   SCHL <chr>, SEX <chr>, HISP <chr>, RAC1P <chr>

Created on 2020-09-08 by the reprex package (v0.3.0)

mfherman commented 3 years ago

Alternatively, you could download the entire PUMS file from the Census FTP:

https://www2.census.gov/programs-surveys/acs/data/pums/2018/5-Year/

It would be the files that have the us suffix like csv_hus.zip

AlexLi-design commented 3 years ago

Wow, it works Matt! You are right. If I pull single states out, that should work. Thanks!


From: Matt Herman notifications@github.com Sent: Tuesday, September 8, 2020 1:26 PM To: walkerke/tidycensus tidycensus@noreply.github.com Cc: Li, Shengxiao lsx@design.upenn.edu; Author author@noreply.github.com Subject: Re: [walkerke/tidycensus] Could not fetch variables in microdata (#292)

Hi -- I'm not totally sure why this is failing, but my guess is the issue is the API is having trouble pulling the 15M+ records for the for all states. If you need PUMS data for all states, you might consider looping through the states one at a time and building in some error handling methods like tryCatch() or purrr::possibly()https://purrr.tidyverse.org/reference/safely.html.

Here is an example with your variables from just one (small!) state:

library(tidycensus)

var_housing <- c( "AGEP", "SEX", "RAC1P", "HISP", "HINCP","SCHL", "DDRS", "DEAR", "DEYE", "DOUT", "DPHY", "MAR", "NPF", "FES", "MV", "WORKSTAT", "HHT", "OCPIP", "GRPIP", "DIVISION", "BLD", "TEN", "VEH", "YBL" )

pums_data <- get_pums( variables = var_housing, state = "VT", year = 2018, survey = "acs5" )

> Getting data from the 2014-2018 5-year ACS Public Use Microdata Sample

pums_data

> # A tibble: 31,883 x 29

> SERIALNO SPORDER WGTP PWGTP AGEP HINCP NPF OCPIP GRPIP DIVISION ST

>

> 1 2014000~ 1 5 6 55 39000 1 26 0 1 50

> 2 2014000~ 2 5 7 56 39000 1 26 0 1 50

> 3 2014000~ 1 11 12 57 100000 2 7 0 1 50

> 4 2014000~ 2 11 13 61 100000 2 7 0 1 50

> 5 2014000~ 1 8 8 71 23200 1 25 0 1 50

> 6 2014000~ 1 12 12 56 57600 2 11 0 1 50

> 7 2014000~ 2 12 13 53 57600 2 11 0 1 50

> 8 2014000~ 1 5 5 67 25410 2 18 0 1 50

> 9 2014000~ 2 5 5 67 25410 2 18 0 1 50

> 10 2014000~ 1 7 7 55 94700 2 0 10 1 50

> # ... with 31,873 more rows, and 18 more variables: BLD , TEN ,

> # VEH , YBL , FES , HHT , MV , WORKSTAT ,

> # DDRS , DEAR , DEYE , DOUT , DPHY , MAR ,

> # SCHL , SEX , HISP , RAC1P

Created on 2020-09-08 by the reprex packagehttps://reprex.tidyverse.org (v0.3.0)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/walkerke/tidycensus/issues/292#issuecomment-689025779, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANB552T6IJCBEYIBN5B6PZ3SEZSM3ANCNFSM4RACAZMQ.

walkerke commented 3 years ago

@mfherman's solution is the right one. My general advice to tidycensus users has always been to look to other sources if you need bulk Census data pulls, as large requests are going to inevitably put pressure on the API. I'd recommend IPUMS and its companion R package {ipumsr} for these sorts of bulk requests.