walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
639 stars 100 forks source link

missing variable values in get_decennial() #71

Closed RoyalTS closed 6 years ago

RoyalTS commented 6 years ago

I've hit on a bit of a weird thing where get_decennial() will error, return NA for some variables or return the correct set of values depending on what combination of variables are being pulled. Minimal-ish reproducible example:

library(tidycensus)

tidycensus::get_decennial(geography = 'tract',
                          variables = c('H0050001', 'PCO0030001'), 
                          year = 2010, county = '001', state = '01')

yields

Getting data from the 2010 decennial Census
Checking SF3 API for data...
Error in as.character(x) : 
  cannot coerce type 'closure' to vector of type 'character'
Error in gather.default(., key = variable, value = value, -GEOID, -NAME) : 
  object 'NAME' not found

,

tidycensus::get_decennial(geography = 'tract',
                          variables = c('H0050001', 'P0120003', 'P0120004', 'PCO0030001'), 
                          year = 2010, county = '001', state = '01')

yields

Getting data from the 2010 decennial Census
# A tibble: 48 x 4
   GEOID       NAME                variable value
   <chr>       <chr>               <chr>    <dbl>
 1 01001020100 Census Tract 201    H0050001    NA
 2 01001020200 Census Tract 202    H0050001    NA
 3 01001020300 Census Tract 203    H0050001    NA
 4 01001020400 Census Tract 204    H0050001    NA
 5 01001020500 Census Tract 205    H0050001    NA
 6 01001020600 Census Tract 206    H0050001    NA
 7 01001020700 Census Tract 207    H0050001    NA
 8 01001020801 Census Tract 208.01 H0050001    NA
 9 01001020802 Census Tract 208.02 H0050001    NA
10 01001020900 Census Tract 209    H0050001    NA
# ... with 38 more rows

,

tidycensus::get_decennial(geography = 'tract',
                          variables = c('H0050001', 'P0120003', 'P0120004'), 
                          year = 2010, county = '001', state = '01')

yields

Getting data from the 2010 decennial Census
# A tibble: 36 x 4
   GEOID       NAME                variable value
   <chr>       <chr>               <chr>    <dbl>
 1 01001020100 Census Tract 201    H0050001  59.0
 2 01001020200 Census Tract 202    H0050001  79.0
 3 01001020300 Census Tract 203    H0050001  70.0
 4 01001020400 Census Tract 204    H0050001 101  
 5 01001020500 Census Tract 205    H0050001 226  
 6 01001020600 Census Tract 206    H0050001 141  
 7 01001020700 Census Tract 207    H0050001 113  
 8 01001020801 Census Tract 208.01 H0050001  95.0
 9 01001020802 Census Tract 208.02 H0050001 309  
10 01001020900 Census Tract 209    H0050001 253  
# ... with 26 more rows

sessionInfo():

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidycensus_0.4.6.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16     pillar_1.1.0     compiler_3.4.3   bindr_0.1        class_7.3-14     tools_3.4.3      uuid_0.1-2       jsonlite_1.5    
 [9] tibble_1.4.2     lattice_0.20-35  pkgconfig_2.0.1  rlang_0.2.0.9001 cli_1.0.0        rstudioapi_0.7   DBI_0.7          curl_2.8.1      
[17] yaml_2.1.16      rgdal_1.2-18     bindrcpp_0.2     e1071_1.6-8      dplyr_0.7.4      httr_1.3.1       stringr_1.3.0    xml2_1.2.0      
[25] rappdirs_0.3.1   tigris_0.7       hms_0.3          tidyselect_0.2.4 rprojroot_1.3-2  classInt_0.1-24  grid_3.4.3       glue_1.2.0      
[33] here_0.0-6       sf_0.6-1         R6_2.2.2         foreign_0.8-69   sp_1.2-7         tidyr_0.8.0      purrr_0.2.4      readr_1.1.1     
[41] udunits2_0.13    magrittr_1.5     backports_1.1.2  units_0.5-1      maptools_0.9-2   assertthat_0.2.0 rvest_0.3.2      utf8_1.1.3      
[49] stringi_1.1.7    crayon_1.3.4    
walkerke commented 6 years ago

I've done some checking this morning and it appears that the variable PCO0030001 is not available at the Census tract level; httr::GET() returns a status code of 204 for "no content".

This has been a long-standing issue in tidycensus that I've been unsure how to satisfactorily resolve, as I haven't found a lookup table that states which variables are available (or unavailable) at which geographies. Improving error-handling has been a long-term goal of mine for the package, so perhaps a 204 status code could trigger a suggestion that the variable isn't available for the requested geography.

RoyalTS commented 6 years ago

Yeah, it'd be great in general if the package threw more informative error messages instead of the httr errors as I often have no idea what to do with those.

Have you talked to the people at the Census directly about such a lookup table? That'd be extremely useful to have!

NikKrieger commented 6 years ago

I've spent the better part of the day learning this the hard way while developing my sociome package. Glad I saw this thread.

bbrewington commented 6 years ago

Looks like there are variables and geographies listed here: https://api.census.gov/data.html

Give that a look, and if that doesn't provide the necessary info, let's ping Logan Powell for help.

walkerke commented 6 years ago

I've reached a semi-resolution to this in https://github.com/walkerke/tidycensus/commit/cb7f542f344a6e329a44898951c57ab12e268783. Errors now return the Census error messages rather than vague R error messages. I think I can iterate and improve on this in the future but this is enough to close the issue.

For example:

tidycensus::get_decennial(geography = 'tract',
                          variables = c('H0050001', 'PCO0030001'), 
                          year = 2010, county = '001', state = '01')

Getting data from the 2010 decennial Census
Error : One or more of your requested variables is likely not available at the requested geography.  Please refine your selection.
profLuna commented 6 years ago

It seems like this error may not be due solely to lack of availability at a given census unit. For example, the following throws the same error: maBlkGrp2010 <- tidycensus::get_decennial(geography = "block group", year = 2010, variables = c(totalPop = "P009001", White = "P009005", Black = "P009006", Asian = "P009008", Hispanic = "P009002"), state = "MA")

Error : One or more of your requested variables is likely not available at the requested geography. Please refine your selection. Error in gather.default(., key = variable, value = value, -GEOID, -NAME) : object 'NAME' not found BUT those variables are indeed available at block group level using American Fact Finder's Download Center: image

It could be that the API has a different set of available geographies, but I don't know how to verify this.

walkerke commented 6 years ago

Thanks for the note @profLuna. The issue is with recent changes to the decennial Census API. The old decennial Census endpoint made block group data available by county; I then have internal logic to iterate through all counties in a state and stitch together a state-level block group dataset.

The new endpoint, however, only appears to make block group data available by Census tract for the decennial Census (you can still get it by county for the ACS). Tidycensus can't handle this at the moment.

@loganpowell we ran into a similar issue with state-level Census tract pulls, which the API team fixed. Do you think they would be willing to restore block group pulls by county for the decennial API?

chihayakenji commented 3 years ago

Was this ever solved? I'm running into issues with:

get_decennial(year=2010,geography = "block", variables = c(pop="P001001"), product="population", state = "RI",show_call = TRUE,geometry=TRUE) Yielding: Getting data from the 2010 decennial Census Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, setoptions(tigris_use_cache = TRUE). Using FIPS code '44' for state 'RI' Census API call: https://api.census.gov/data/2010/dec/sf1?get=P001001%2CNAME&for=block%3A%2A&in=state%3A44 Error : Your API call has errors. The API message returned is error: unknown/unsupported geography heirarchy. Error in gather.default(., key = variable, value = value, -GEOID, -NAME) : object 'NAME' not found

even though the block-level data is available.

mfherman commented 3 years ago

Thanks, @chihayakenji! I just opened #316 to address this. Will hopefully get around to implementing this in the next week or two now that the API can provide all blocks or block groups in a state with wildcards.