Open dkyleward opened 8 years ago
Maybe this line needs reconstruction for calling from blocks?
I ran this on the dev version and get a different error. Tracking this down still.
call_census_api(
"H0100001", names = c("pop_hh"), geoids = "51775", allgeos = "bl",
data_source = "sf1")
## Error in call_api_once(variables_to_get, geoid, allgeos, data_source, :
## client error: (400) Bad Request
This seems to be an error at Census API. The URL it's building should work, but it returns an error.
http://api.census.gov/data/2010/sf1?get=H0100001&for=block:*&in=state:51+county:775
The block group version works:
http://api.census.gov/data/2010/sf1?get=H0100001&for=block+group:*&in=state:51+county:775
Based on the examples here, this should work.
This brings up another point though. We should improve the passing of error messages from Census API directly to R.
The Developer forum seems to be wigging out right now. I've sent an email to the Census API contact I have. She is working to answer my question via email as there's a licensing issue with the Q&A site.
This is the message from my Census contact:
We restrict the use of wildcards to prevent very large data pools, in the case of blocks, there are over 100,000 blocks in some counties, and since we allow people to pull 50 variables at a time, we're talking about 5 million cells of data to be pulled in a single API call.
The best way to tell what kind of wildcards can be used right now is to look at the examples.html page in discovery for a given dataset (http://api.census.gov/data/2010/sf1/examples.html) The first example shows the broadest use of wildcards allowed for that each hierarchy. Since you see tract in the example, you can know you must use tract.
With that, we need to decide how we'd like to proceed. Do we also want to restrict this behavior and produce an error, or do we want to query all the tracts in the county, submit each tract query separately, and compile/return?
I can see reasons for either approach. Playing nicely with the Census API would suggest that you trap it instead of work around it. If you decide to work around it, it would be nice to include a warning that downloading that many records takes a long time.
Originally, I worked around it by submitting all the block IDs, but I took Greg's advice and just sampled 1500 blocks instead. I didn't really need that many blocks, and the time required was too long.
Does this mean the API never returns *
blocks when at a county? If we know what their rules are then we can write good error handling.
You have to specify the tract to get * blocks to return. So yes to your question. If you request * blocks and only the county, you'll get an error.
I remain immensely impressed with this package! However, I'm running into a problem. It may be related to this issue, so I'm putting here. I can always move it if desired. Anyway:
Here is a code block that works. I'm pulling size data for three counties in Florida.
library(tidyverse)
library(censusr)
county_fips <- c(
"12099", # Palm Beach
"12011", # Broward
"12086" # Miami-Dade
)
size_vars<- paste0(
"B19019_", sprintf("%03d", 1:8), "E"
)
size_names <- c(
"total",
paste0("size_", sprintf("%d", 1:7))
)
size_data <- call_census_api(
variables = size_vars,
names = size_names,
geoids = county_fips,
data_source = "acs",
year = 2015,
period = 5
)
This one doesn't work. The only difference is the inclusion of the allgeos
option. It's at a tract level, so we're not talking about 100k geographies.
library(tidyverse)
library(censusr)
county_fips <- c(
"12099", # Palm Beach
"12011", # Broward
"12086" # Miami-Dade
)
size_vars<- paste0(
"B19019_", sprintf("%03d", 1:8), "E"
)
size_names <- c(
"total",
paste0("size_", sprintf("%d", 1:7))
)
size_data <- call_census_api(
variables = size_vars,
names = size_names,
geoids = county_fips,
allgeos = "tr",
data_source = "acs",
year = 2015,
period = 5
)
The error message is:
Error in `[<-.data.frame`(`*tmp*`, , -1, value = numeric(0)) :
replacement has 0 items, need 7088
What makes this potentially different is that this code block also works. It has the allgeos option specified, but it is pulling a different table. This one is income, but pulling workers is also successful.
# get households by income
income_vars <- paste0(
"B19001_", sprintf("%03d", 1:17), "E"
)
inc_breaks <- c(10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 75, 100, 125, 150, 200)
income_names <- c(
"total",
paste0("under_", sprintf("%d", inc_breaks)),
"over_200"
)
income_data_orig <- call_census_api(
variables = income_vars,
names = income_names,
geoids = county_fips,
allgeos = "tr",
data_source = "acs",
year = 2015,
period = 5
)
Pretty much every ACS table should be available at the tract, so I don't think that's the problem. There are only 1219 tracts in the three counties, so that's not it.
Nevermind my last comment. It looks as though I was simply trying to pull data from a table that was too stratified. Took me a while to figure it out. I guess that's a plug for #17
Glad you figured it out. I skimmed your message and said “hmmmm” out loud. I was thinking we’d really have to dig to figure it out.
I cannot think of a programmatic way to error check this kind of thing. Can you, given that you spent the full day thinking about this?
Something that might help would be to check the data returned by the API. If all null/NA, throw a user-friendly error:
Error: The Census did not return any data for this combination of variables and geography.
This is probably easy to do. I'd have sent you a pull request if I understood the package/api at all.
The following code correctly returns a list of household population at the block group level.
The following code should return a list of household population at the block level, but instead it returns an error.