walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
639 stars 100 forks source link

Continued 2000 Decennial sf1 api call issues #343

Closed kaseyzapatka closed 3 years ago

kaseyzapatka commented 3 years ago

Hi @walkerke and @mfherman ,

Thanks for all the work you do with tidycensus. I'm using it to download dissertation data and its great. All the ACS calls seem to be working, but I'm having trouble with the 2000 decennial sf1 and sf3 calls.

I saw earlier posts about the census moving the API endpoints issue # 293 and combing sf1 and sf3 calls issue #256. So, I updated to tidycensus 0.11.4 and called sf1 and sf3 separately and double-checked the census api documentation for sf1 to make sure I had the correct var numbers. But I was still having the above problems.

Here's my code for just my sf1 call. I assign the sf1 vars and then run the call:

my_vars00sf1 <- c(pop = "P001001", # population 
                # race-ethnicity
                racetot = "P007001", nhwht = "P007002", nhblk = "P007003", nhnat = "P007004", nhasian = "P007005",
                nhpac = "P007006", nhother = "P007007", nhtwo = "P007008", hisptot = "P008001", nothisp = "P008002", 
                hisp = "P008010",
                # sex 
                sex_tot = "P012001", sex_m = "P012002", sex_f = "P012026",
                # median age
                mage = "P013001", mage_m = "P013002", mage_f = "P013003", 
                # housing unit
                hu = "H003001", occ = "H003002", vac = "H003003", occ_tot = "H004001", owner = "H004002", 
                renter = "H004003", hhsize = "H012001", hhsizeo = "H012002", hhsizer = "H012003")

# run function to call all sf1 for country
sf1 <- get_decennial(state = "CT",
                     geography = "tract",  # specify tract geography
                     year = 2000,  # specif year, 2000
                     variables = my_vars00sf1 ,# set variables 
                     sumfile = "sf1", # specify sum file for decennial
                     show_call = TRUE,# show call for troubleshooting
                     geometry = FALSE) # specify that we want the output to be wide 

# glimpse to verify call worked        
head(sf1)

However, it throws this error message each time. I've tried restarting my RStudio session and my computer:

Getting data from the 2000 decennial Census
Using FIPS code '09' for state 'CT'
Census API call: https://api.census.gov/data/2000/dec/sf1?get=P001001%2CP007001%2CP007002%2CP007003%2CP007004%2CP007005%2CP007006%2CP007007%2CP007008%2CP008001%2CP008002%2CP008010%2CP012001%2CP012002%2CP012026%2CP013001%2CP013002%2CP013003%2CH003001%2CH003002%2CH003003%2CH004001%2CH004002%2CH004003%2CH012001%2CH012002%2CH012003%2CNAME&for=tract%3A%2A&in=state%3A09
Error in load_data_decennial(geography, variables, key, year, sumfile = "sf3",  : 
  The Census API has returned the error message error: error: unknown variable 'H003002'.
This may be due to mixing SF1 and SF3 variables. If so, separate your requests to SF1 and SF3 when using `get_decennial()` by using the `sumfile` argument in separate calls.
Variables not found in Summary File 1. Trying Summary File 3...
Error in gather.default(., key = variable, value = value, -GEOID, -NAME) : 
  object 'NAME' not found

I also tried isolating problematic variables (i.e., taking out housing variable (H003002-H012003) chunk, but then I had problems with the median age variable (P013001-P013003) chunk. The most pared down call that worked was one that called the pop (P001001) thru sex_f (P012026) variables, but that leaves out a number of variables I'm going to need. I then tried only the problematic variables (the housing chunk and then the median age chunk). The housing didn't work but the median age did. So, median age chunk was not throwing an error when called by itself but was when it was called with pop thru sex chunk. Could this be an issue about the sequence in which I call variables? If I call each variable one at a time the calls work, but that seems impractical since i need to call a lot of variables and for multiple states. Creating a function to call each variable one at time seems inefficient to call them in a batch like I am doin.

Is there something I'm missing here? I don't think it's my code. I know if tidyverse can't find a var in sf1, then it looks in sf3, which the error message seems to indicate is what is happening. But H003002 is an sf1 var, so I'm not sure why it's looking in sf3.

Thanks for your help.

Best, Kasey

walkerke commented 3 years ago

Hi @kaseyzapatka - thanks for the note. I do know that the Census website has been acting up today (I've been having calls fail that usually work in my own work today) and the entire website was down this morning.

When I run your code with show_call = TRUE then paste the result in my browser (direct link here), I am getting the error message There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience. tidycensus is likely hitting this then triggering the internal logic to check SF3, which gives you what is a misleading error message in this instance.

My hunch is that you should try again tomorrow and see if the API problems have been resolved. If not, we can explore further.

walkerke commented 3 years ago

So it seems like the 2000 SF1 API is maxing out at 15 variables. For example, this API call is working but this one with one additional variable is not.

I'm hesitant to patch this in tidycensus just yet as the typical limit is 50 variables per call so I wonder if this is temporary. You might consider asking about this on the Census Slack channel or filing an issue on the API's GitHub page.

kaseyzapatka commented 3 years ago

Thanks for the quick reply. I wondered if that was the case and have since tried the call a few times, but I’m still having problems. I've also varied the number of variables and which variables are in the call. I ran both calls below. The first has 16 and the second 13. The first works but the second doesn't.

1

my_vars00sf1 <- c(pop = "P001001", # population 
                # race-ethnicity
                racetot = "P007001", nhwht = "P007002", nhblk = "P007003", nhnat = "P007004", nhasian = "P007005",
                nhpac = "P007006", nhother = "P007007", nhtwo = "P007008", hisptot = "P008001", nothisp = "P008002", 
                hisp = "P008010", mage = "P013001",
                # sex 
                sex_tot = "P012001", sex_m = "P012002", sex_f = "P012026")

# run function to call all sf1 for country
sf1 <- get_decennial(state = "CT",
                     geography = "tract",  # specify tract geography
                     year = 2000,  # specif year, 2000
                     variables = my_vars00sf1 ,# set variables 
                     sumfile = "sf1", # specify sum file for decennial
                     show_call = TRUE,# show call for troubleshooting
                     geometry = FALSE) # specify that we want the output to be wide 

# glimpse to verify call worked        
head(sf1)

2

my_vars00sf1 <- c(pop = "P001001", # population 
                # housing unit
                hu = "H003001", occ = "H003002", vac = "H003003", occ_tot = "H004001", owner = "H004002", 
                renter = "H004003", hhsize = "H012001", hhsizeo = "H012002", hhsizer = "H012003",
                # sex 
                sex_tot = "P012001", sex_m = "P012002", sex_f = "P012026")

# run function to call all sf1 for country
sf1 <- get_decennial(state = "CT",
                     geography = "tract",  # specify tract geography
                     year = 2000,  # specif year, 2000
                     variables = my_vars00sf1 ,# set variables 
                     sumfile = "sf1", # specify sum file for decennial
                     show_call = TRUE,# show call for troubleshooting
                     geometry = FALSE) # specify that we want the output to be wide 

# glimpse to verify call worked        
head(sf1)

The call seems to be tripping over specific batches of variables. For example, "H003001-H003003" and "H004001- H004003" trip up the call unless I request each variable separately. I'll follow your advice and post on the US Census Slack channel, but do you have any other ideas of what might be going on?

Thanks again

kaseyzapatka commented 3 years ago

Hi @walkerke, its seems the 2000 sf1 endpoints are working per a conversation with Logan Powell.

I can't figure out what the problem seems to be: calls on these variables seems to work in very small batches (3 or 4 vars at a time) or one at a time, but when combined in the larger batch (any of the calls above), the call fails. It seems like tidycensus get tripped up somewhere and looks for these vars in sf3, even though I only specified sf1. I'll likely just break this call into smaller batches or use NHGIS like you suggested on stack-exchange. Anyways, wanted to let you know.

Thanks for your help and for your continued maintenance of this package. I imagine it's a lot of work.

walkerke commented 3 years ago

Hi @kaseyzapatka - I'm going to post over on the Census page as well but this call is working for me, from SF1:

library(tidycensus)

test <- get_decennial(
  geography = "tract",
  variables = c("H003001", "H003002", "H003003"),
  state = "CT",
  year = 2000,
  show_call = TRUE
)

Is it failing for you? The larger variable pulls are still failing for me.

kaseyzapatka commented 3 years ago

Yes, your call works for me. But just adding four more variables breaks it:


test <- get_decennial(
  geography = "tract",
  variables = c("H003001", "H003002", "H003003", "H004003", "H012001", "H012002", "H012003"),
  state = "CT",
  year = 2000,
  show_call = TRUE
)

test

Error in load_data_decennial(geography, variables, key, year, sumfile = "sf3", : The Census API has returned the error message error: error: unknown variable 'H003002'. This may be due to mixing SF1 and SF3 variables. If so, separate your requests to SF1 and SF3 when using get_decennial() by using the sumfile argument in separate calls. Error in gather.default(., key = variable, value = value, -GEOID, -NAME) : object 'NAME' not found

It gets hung up on "H003002", which it just called successfully in your call.

plnnr commented 3 years ago

For SF3, I am also getting issues when I string multiple variables together. I'm using build #afc18b1.

This works:

hhincome_api_query <- get_decennial(geography = "tract", 
                                    variables = "P151D017",
                                    year = 2000, 
                                    sumfile = "sf3", 
                                    state = "OR", 
                                    county = "Multnomah", 
                                    geometry = F)

This does not:

hhincome_api_query <- get_decennial(geography = "tract", 
                                    variables = c("P151D017", "P151I011", "P151G010"), 
                                    year = 2000, 
                                    sumfile = "sf3", 
                                    state = "OR", 
                                    county = "Multnomah", 
                                    geometry = F)

Error message:

Error : Your API call has errors.  The API message returned is There was an error while running your query.  We've logged the error and we'll correct it ASAP.  Sorry for the inconvenience..
Variables not found in Summary File 1. Trying Summary File 3...
Error in gather_(data, key_col = compat_as_lazy(enquo(key)), value_col = compat_as_lazy(enquo(value)),  : 
  unused argument (-NAME)
walkerke commented 3 years ago

Hi @plnnr - yeah it seems like this is an on-going issue on the API side. FWIW, as a workaround:

vars <- c("P151D017", "P151I011", "P151G010")

hhincome_api_query2 <- purrr::map_df(vars, ~{
  get_decennial(geography = "tract", 
                variables = .x, 
                year = 2000, 
                sumfile = "sf3", 
                state = "OR", 
                county = "Multnomah", 
                geometry = F)
})
> hhincome_api_query2
# A tibble: 510 x 4
   GEOID       NAME                                          variable value
   <chr>       <chr>                                         <chr>    <dbl>
 1 41051010402 Census Tract 104.02, Multnomah County, Oregon P151D017     7
 2 41051010405 Census Tract 104.05, Multnomah County, Oregon P151D017     0
 3 41051010406 Census Tract 104.06, Multnomah County, Oregon P151D017     0
 4 41051010407 Census Tract 104.07, Multnomah County, Oregon P151D017     0
 5 41051010408 Census Tract 104.08, Multnomah County, Oregon P151D017     0
 6 41051010409 Census Tract 104.09, Multnomah County, Oregon P151D017     0
 7 41051010500 Census Tract 105, Multnomah County, Oregon    P151D017     0
 8 41051000100 Census Tract 1, Multnomah County, Oregon      P151D017     9
 9 41051000200 Census Tract 2, Multnomah County, Oregon      P151D017     0
10 41051000301 Census Tract 3.01, Multnomah County, Oregon   P151D017    19
# … with 500 more rows

While it isn't ideal to hit the API separately for each variable, it runs and it isn't too slow.

mfherman commented 3 years ago

An update from the Census Slack. This appears to be a known issue that they are working to address and seems to clarify when this behavior is occurring:

It's happening any time variables from more than two groups/tables are requested, e.g. P052###, P095###, P001### (instead of a specific number of variables), for the Decennial SF1 and SF3 datasets.

cc @yeonhwalee

walkerke commented 3 years ago

Hi everyone - I've just run through all of the problem examples and they are all working this morning. I think we can consider this issue resolved! Thanks to @loganpowell and everyone on the Census API team!