Closed benjamin-chan closed 2 years ago
Can you try just opening a partial URL in the browser? This looks like a temporary failure.
For example: https://api.census.gov/data/2010/dec/sf1
Do you get a json page?
I'm unable to recreate this. Here's a reprex from my end:
library(wru)
data(voters) # part of the wru package
r <- predict_race(
voter.file = voters[voters$state == "NJ", ],
surname.only = FALSE,
surname.year = 2020,
census.geo = "tract",
census.key = Sys.getenv("CENSUS_API_KEY"),
age = FALSE,
sex = FALSE)
)
Ah, I see the voters
data has state
, county
, tract
, and block
as factors. predict_race()
works when I convert my columns to factors.
I take back my comment about factor()
ing. It seems like the census.gov API might be throttling the data frame size.
probabilities <-
df %>%
# dplyr::sample_n(1600) %>% # No errors
dplyr::sample_n(1700) %>% # Results in Error in file(con, "r") : cannot open the connection
predict_race(surname.only = FALSE,
surname.year = 2020,
census.geo = "tract",
census.key = key)
Please close the issue if this is unrelated to the wru
package.
That's strange, is your census key quite old, or shared? For example, I don't get rate limited running in parallel on a full voter file run (200M+) over the course of a day.
My key was originally generated July 2020. When I tried to generate a new key using the same email address as my previous key, it gave me the same one. I'll keep troubleshooting the API but would welcome any ideas.
I've got nothing. Mainly because I hammer the API and I've never received a time out. Can you try using use.counties = TRUE
to see if that helps? It will limit the census data pull to just those tracts that are within the counties that are in your voter file.
Are you currently in the US?
Same issue with use.counties = TRUE
. I'm on Oregon and filter my data frame with filter(state == "OR")
. I also filter out invalid geocodes (zip5 only, intersection, etc.) so it's not passing junk census tracts. I wonder if there's rate limiting on my side (state government agency). I'll try to play around on an Azure VM.
With respect to the census tracts, are you sure they are 2010 tracts? I know that sounds like a strange question but they changed with the decennial census and although some are equivalently named - they are not necessarily the same places and some may not exist. This could be why you're getting strange results (you're submitting tracts that don't exist in the census year you're pulling from).
edit: 2010*, that's the function default for the year argument.
I tried restricting to records I geocoded in May 2022 and no luck. I also tried with year = "2010"
and year = "2020"
and get the same issue. Also tried bumping up retry
with no luck.
FWIW, I used RedPoint to generate the geocoding. And most of the data had been geocoded late-Nov, early-Dec 2021. My data is from 2019-2021 records.
I should also add that I didn't have an issue with version wru_0.1-12
What's strange to me is that you have a subset that is leading to a failure. Here's what I might try. Run each row, see which ones fail. Show us those rows. There must be something about them that Census API dislikes. My suspicion is mismatched census tracts but you've ruled that out. Here's some sample code that can assist with the investigation:
library(wru)
library(dplyr)
set.seed(42) # let's make sure we can reproduce, if no failures, adjust sample up/down
df <- load_your_df # psuedo-code!
df <- dplyr::sample_n(df, 1700)
error_scanner <- purrr:::map(1:nrow(df), function(x) {
tryCatch({ predict_race(
voter.file = df[i,],
surname.only = FALSE,
surname.year = 2020,
census.geo = "tract",
census.key = key)}, error = function(e) error)
})
Now we can run over our list to see if error_scanner
has anything in it that inherits an error:
rows <- map(error_scanner, function(x) inherits(x, "error")) %>% unlist() %>% which()
df[rows, ] # will output the problematic rows
error_scanner[rows] # will output a list of errors hopefully all the same!
The error scanner didn't pick up any errors operating on one row at a time. Here's my code. I don't do package dev or debugging so maybe I'm error scanning wrong.
f <- file("error.txt", open = "wt")
sink(f, type = "message")
test <- df %>% head(1700) # First 1700 rows results in Error in file(con, "r") : cannot open the connection
predict_race(test, census.geo = "tract", census.key = key, use.counties = TRUE) # Verify error message
sink()
error_scanner <-
purrr::map(1:nrow(test),
function(i) {
tryCatch({predict_race(test[i, ], census.geo = "tract", census.key = key, use.counties = TRUE)},
error = function(e) e)
})
Output
> rows <- map(error_scanner, function(x) inherits(x, "error")) %>% unlist() %>$
> length(rows)
[1] 0
> df[rows, ] # will output the problematic rows
[1] surname geo_result_category race
[4] ethnicity age record_id
[7] state county tract
[10] block sex
<0 rows> (or 0-length row.names)
> error_scanner[rows] %>% unique() # will output a list of errors hopefully a$
list()
Contents of error.txt
sink so you can see the call to the API
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://api.census.gov/data/2010/dec/sf1?key=f1eed76ebb3f906330f30e4521c55dbebe54094a&get=P005003,P005004,P005005,P005006,P005007,P005008,P005009,P005010&for=county:005,017,051,005,047,049,071,051,019,051,047,067,033,053,019,005,005,033,029,039,065,047,017,029,051,005,051,039,017,033,003,051,067,059,005,051,033,071,039,029,019,053,005,051,067,067,005,061,047,053,017,067,051,043,051,067,039,051,051,047,043,051,053,053,051,067,065,051,039,047,051,033,047,067,047,005,005,019,051,017,047,033,067,059,029,005,033,039,065,051,051,067,019,067,051,051,067,071,051,043,005,005,029,039,053,051,039,039,071,067,041,047,005,067,051,029,047,051,067,071,043,019,047,057,067,067,047,005,067,005,051,007,053,007,065,033,051,005,051,051,047,029,033,051,047,005,047,005,051,043,005,005,051,017,005,011,019,041,067,067,059,039,067,047,047,051,047,047,051,011,047,035,029,051,039,051,067,051,051,033,065,047,039,017,039,019,043,001,029,019,033,051,005,029,039,059,051,009,051,071,029,047,043,053,00 [... truncated]
Try census server again: https://api.census.gov/data/2010/dec/sf1?
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://api.census.gov/data/2010/dec/sf1?key=f1eed76ebb3f906330f30e4521c55dbebe54094a&get=P005003,P005004,P005005,P005006,P005007,P005008,P005009,P005010&for=county:005,017,051,005,047,049,071,051,019,051,047,067,033,053,019,005,005,033,029,039,065,047,017,029,051,005,051,039,017,033,003,051,067,059,005,051,033,071,039,029,019,053,005,051,067,067,005,061,047,053,017,067,051,043,051,067,039,051,051,047,043,051,053,053,051,067,065,051,039,047,051,033,047,067,047,005,005,019,051,017,047,033,067,059,029,005,033,039,065,051,051,067,019,067,051,051,067,071,051,043,005,005,029,039,053,051,039,039,071,067,041,047,005,067,051,029,047,051,067,071,043,019,047,057,067,067,047,005,067,005,051,007,053,007,065,033,051,005,051,051,047,029,033,051,047,005,047,005,051,043,005,005,051,017,005,011,019,041,067,067,059,039,067,047,047,051,047,047,051,011,047,035,029,051,039,051,067,051,051,033,065,047,039,017,039,019,043,001,029,019,033,051,005,029,039,059,051,009,051,071,029,047,043,053,00 [... truncated]
Try census server again: https://api.census.gov/data/2010/dec/sf1?
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://api.census.gov/data/2010/dec/sf1?key=f1eed76ebb3f906330f30e4521c55dbebe54094a&get=P005003,P005004,P005005,P005006,P005007,P005008,P005009,P005010&for=county:005,017,051,005,047,049,071,051,019,051,047,067,033,053,019,005,005,033,029,039,065,047,017,029,051,005,051,039,017,033,003,051,067,059,005,051,033,071,039,029,019,053,005,051,067,067,005,061,047,053,017,067,051,043,051,067,039,051,051,047,043,051,053,053,051,067,065,051,039,047,051,033,047,067,047,005,005,019,051,017,047,033,067,059,029,005,033,039,065,051,051,067,019,067,051,051,067,071,051,043,005,005,029,039,053,051,039,039,071,067,041,047,005,067,051,029,047,051,067,071,043,019,047,057,067,067,047,005,067,005,051,007,053,007,065,033,051,005,051,051,047,029,033,051,047,005,047,005,051,043,005,005,051,017,005,011,019,041,067,067,059,039,067,047,047,051,047,047,051,011,047,035,029,051,039,051,067,051,051,033,065,047,039,017,039,019,043,001,029,019,033,051,005,029,039,059,051,009,051,071,029,047,043,053,00 [... truncated]
Try census server again: https://api.census.gov/data/2010/dec/sf1?
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://api.census.gov/data/2010/dec/sf1?key=f1eed76ebb3f906330f30e4521c55dbebe54094a&get=P005003,P005004,P005005,P005006,P005007,P005008,P005009,P005010&for=county:005,017,051,005,047,049,071,051,019,051,047,067,033,053,019,005,005,033,029,039,065,047,017,029,051,005,051,039,017,033,003,051,067,059,005,051,033,071,039,029,019,053,005,051,067,067,005,061,047,053,017,067,051,043,051,067,039,051,051,047,043,051,053,053,051,067,065,051,039,047,051,033,047,067,047,005,005,019,051,017,047,033,067,059,029,005,033,039,065,051,051,067,019,067,051,051,067,071,051,043,005,005,029,039,053,051,039,039,071,067,041,047,005,067,051,029,047,051,067,071,043,019,047,057,067,067,047,005,067,005,051,007,053,007,065,033,051,005,051,051,047,029,033,051,047,005,047,005,051,043,005,005,051,017,005,011,019,041,067,067,059,039,067,047,047,051,047,047,051,011,047,035,029,051,039,051,067,051,051,033,065,047,039,017,039,019,043,001,029,019,033,051,005,029,039,059,051,009,051,071,029,047,043,053,00 [... truncated]
Data access failure at the census website, please try again by re-run the previous command
https://api.census.gov/data/2010/dec/sf1?key=f1eed76ebb3f906330f30e4521c55dbebe54094a&get=P005003,P005004,P005005,P005006,P005007,P005008,P005009,P005010&for=county:005,017,051,005,047,049,071,051,019,051,047,067,033,053,019,005,005,033,029,039,065,047,017,029,051,005,051,039,017,033,003,051,067,059,005,051,033,071,039,029,019,053,005,051,067,067,005,061,047,053,017,067,051,043,051,067,039,051,051,047,043,051,053,053,051,067,065,051,039,047,051,033,047,067,047,005,005,019,051,017,047,033,067,059,029,005,033,039,065,051,051,067,019,067,051,051,067,071,051,043,005,005,029,039,053,051,039,039,071,067,041,047,005,067,051,029,047,051,067,071,043,019,047,057,067,067,047,005,067,005,051,007,053,007,065,033,051,005,051,051,047,029,033,051,047,005,047,005,051,043,005,005,051,017,005,011,019,041,067,067,059,039,067,047,047,051,047,047,051,011,047,035,029,051,039,051,067,051,051,033,065,047,039,017,039,019,043,001,029,019,033,051,005,029,039,059,051,009,051,071,029,047,043,053,005,019,017,067,017,051,005,005,051,039,051,029,053,005,005,029,067,007,005,047,051,039,051,037,059,005,039,067,017,009,051,051,005,029,051,029,047,033,059,067,029,033,037,051,051,005,005,051,047,053,011,051,051,067,067,005,033,047,051,035,005,005,017,029,019,029,029,027,071,047,037,039,019,009,019,043,029,033,047,051,047,005,043,005,051,001,029,051,029,047,067,039,047,039,051,065,051,017,039,051,033,067,041,005,047,067,029,033,067,005,051,047,035,043,051,039,039,007,067,051,067,067,051,029,067,041,053,005,067,003,039,047,005,047,005,051,033,071,071,047,029,051,033,029,059,029,033,047,061,033,051,051,051,029,067,005,067,005,039,051,017,051,059,067,067,033,029,017,067,067,007,029,005,005,051,029,029,047,045,039,051,047,067,067,029,051,051,029,029,039,051,051,051,029,029,005,071,051,051,051,067,051,051,051,047,067,005,033,029,017,067,005,039,005,051,051,059,005,023,011,011,043,065,019,005,029,051,067,041,047,051,011,051,029,005,071,057,029,009,039,053,039,051,039,039,039,047,043,037,029,051,067,033,039,033,049,005,047,039,047,039,067,017,005,029,067,051,051,005,051,067,051,051,051,029,029,071,029,029,029,039,051,051,039,005,071,011,047,071,049,029,047,019,051,039,061,051,051,051,005,029,005,051,005,051,051,005,005,051,051,051,071,007,029,029,039,039,039,039,029,019,005,041,067,051,039,051,057,039,051,035,049,051,019,047,051,071,047,067,005,051,005,039,071,029,047,051,039,039,039,051,067,067,029,029,067,071,071,005,067,051,071,039,051,029,051,051,067,071,051,067,067,005,005,067,067,005,071,067,067,067,029,067,067,067,065,067,067,005,067,067,029,005,067,057,067,067,067,005,039,039,033,039,051,067,037,015,071,047,051,005,067,067,005,071,005,005,067,065,051,067,067,067,005,051,067,067,051,051,043,003,047,005,051,051,051,051,051,067,067,051,067,067,051,051,067,071,051,051,005,067,067,067,067,067,067,067,005,067,071,051,051,067,007,005,051,051,005,067,039,029,041,059,059,071,071,043,071,023,005,071,071,023,047,019,023,021,039,043,071,051,039,067,043,039,039,071,005,047,017,033,013,039,037,047,029,009,051,039,051,039,067,043,059,011,011,067,067,051,067,005,051,067,067,071,065,071,067,029,051,005,067,005,005,067,029,067,005,067,005,051,029,029,039,039,067,007,051,039,039,011,011,043,019,023,023,047,029,023,049,059,039,051,037,005,029,051,047,051,023,021,059,015,067,067,071,023,071,015,059,057,059,003,051,017,051,067,059,005,029,059,059,023,023,059,019,067,011,069,005,071,047,047,019,019,051,071,067,059,059,071,015,023,059,051,051,039,029,039,039,039,039,047,017,019,005,053,047,005,039,039,051,067,005,051,051,033,051,051,051,051,051,051,005,039,039,051,051,029,051,051,051,005,067,039,039,051,051,051,051,039,009,009,067,009,039,051,039,039,039,071,051,005,005,051,009,043,051,051,041,005,051,067,005,067,051,067,051,067,067,005,051,005,005,047,051,005,005,051,051,067,005,005,005,051,005,051,005,051,005,067,067,067,005,051,039,067,067,017,051,051,067,051,051,005,067,067,067,029,005,067,067,067,005,067,067,067,051,067,067,067,051,039,005,007,051,005,051,039,033,067,029,051,051,051,051,051,051,067,039,029,051,051,039,009,051,039,051,051,039,005,039,051,005,005,005,067,005,005,067,009,051,051,051,005,051,005,051,067,005,005,051,005,051,067,067,005,067,005,005,051,051,067,005,051,051,051,071,005,005,051,053,007,067,067,007,005,059,039,033,039,039,047,071,035,005,039,051,051,039,071,005,051,005,019,039,047,005,005,039,039,005,039,039,033,005,029,027,051,005,047,047,057,039,015,051,009,007,051,067,047,019,013,003,011,051,051,005,071,067,039,039,033,005,071,029,043,005,051,053,057,067,067,015,011,039,067,051,051,039,051,067,047,009,047,015,047,039,003,017,015,029,043,011,039,053,047,011,029,043,043,015,029,041,067,047,067,047,015,015,067,029,071,029,067,011,067,053,039,051,005,005,017,051,067,051,039,067,051,039,039,009,011,051,051,029,039,051,039,001,051,043,005,005,005,005,051,047,043,043,067,051,051,005,071,067,053,051,067,005,005,067,067,005,051,053,051,005,067,067,005,067,047,047,005,067,067,067,005,051,005,051,067,051,067,067,067,051,067,067,051,067,067,041,067,067,029,067,067,067,067,005,067,051,005,067,067,005,067,047,067,029,067,005,047,005,013,067,051,005,005,047,051,051,035,067,067,047,051,033,033,051,067,017,015,047,051,041,005,005,051,051,051,029,071,043,067,067,051,051,051,067,067,067,067,005,005,067,067,067,005,067,067,067,051,067,029,067,005,067,067,067,051,067,067,005,067,051,067,051,051,039,039,017,007,029,039,039,039,039,067,039,039,051,067,051,067,039,067,009,009,009,067,051,039,051,017,067,051,005,005,029,039,051,051,051,039,067,051,039,051,067,051,051,009,051,071,051,067,005,051,067,051,005,047,005,005,005,051,005,005,051,005,067,051,005,051,005,051,005,005,005,005,051,005,005,071,067,067,051,067,039,051,033,005,051,039,039,039,067,005,007,005,015,047,051,059,011,005,067,027,047,051,005,011,051,067,005,003,053,053,051,005,005,051,067,067,001,033,033,029,029,029,005,005,047,059,057,005,067,067,047,015,017,047,047,005,017,067,051,015,067,067,067,067,039,039,051,039,051,005,047,007,047,067,005,051,051,051,015,013,011,071,067,039,011,031,029,015,067,067,067,051,005,067,067,005,051,005,067,067,067,067,067,051,015,039,011,067,067,011,011,051,005,005,005,005,067,007,047,047,039,039,003,019,047,033,067,047,015,015,005,051,005,019,017,039,003,005,067,047,047,011,053,003,011,067,051,051,005,067,047,033,039,011,005,057,011,065,051,051,015,065,065,047,051,067,053,051,011,011,029,071,067,067,051,051,053,029,071,067,047,029,053,017,059,005,041,039,039,067,001,059,057,039,033,067,005,051,015,019,019,071,005,005,051,051,011,053,029,033,005,029,053,067,015,067,029,043,015,015,033,033,047,047,005,005,015,059,059,067,047,033,039,033,029,051,067,067,067,005,067,067,067,051,051,067,067,067,005,051,051,067,051,067,067,029,071,011,011,011,011,065,051,067,011,011,005,011,067,011,051,067,039,011,029,011,043,067,031,001,051,003,029,067,051,033,067,067,067,067,051,011,011,039,039,071,029,005,053,053,053,015,023,049,043,005,005,067,009,009,067,051,051,067,067,067,005,051,029,005,005,051,051,067,067&in=state:41
Error in get_census_api_2(data_url, key, get, region, retry) :
Warning message:
In sink() : no sink to remove
And here's the counties I have in test
> test %>% pull(county) %>% levels()
[1] "001" "003" "005" "007" "009" "011" "013" "015" "017" "019" "021" "023"
[13] "025" "027" "029" "031" "033" "035" "037" "039" "041" "043" "045" "047"
[25] "049" "051" "053" "055" "057" "059" "061" "063" "065" "067" "069" "071"
Can you send me a failing sample of your voter.file data to brandon@bertelsen.ca? I'll try it out.
@benjamin-chan I think we have it sorted. A sneaky little issue where your counties aren't 0 padded. This is based on the test data that you have sent me.
library(wru)
predict_race(
voter.file = read.csv('~/Downloads/test.csv'), # reading your file in straight
census.geo = "county"
)
# resulting error message
"
Error in census_helper_new(key = census.key, voter.file = voter.file, :
The following locations in the voter.file are not available in the census
data (listed as state-county): OR-1, OR-3, OR-5, OR-7, OR-9, OR-11, OR-13,
OR-15, OR-17, OR-19, OR-21, OR-23, OR-27, OR-29, OR-33, OR-35, OR-37,
OR-39, OR-41, OR-43, OR-45, OR-47, OR-49, OR-51, OR-53, OR-57, OR-59,
OR-61, OR-65, OR-67, OR-69, OR-71
"
vf <- read.csv("~/Downloads/test.csv")
vf$county <- formatC(vf$county, width = 3, flag = 0) # adjusting county 0-padding
predict_race(
voter.file = vf,
census.geo = "county"
)
# successful output!
surname state county tract pred.whi pred.bla pred.his pred.asi pred.oth
545 SMITH OR 039 2102 0.9154447 0.016074746 0.010523511 0.0027175710 0.05523950
329 SMITH OR 017 1400 0.9481373 0.006005211 0.010422092 0.0010769548 0.03435848
523 SMITH OR 033 360500 0.9385888 0.006368911 0.008798114 0.0009972384 0.04524691
5 SMITH OR 003 900 0.9216835 0.015526467 0.009213403 0.0058890789 0.04768756
478 SMITH OR 029 800 0.9260089 0.011265711 0.015494511 0.0015503510 0.04568055
...
I should note too, there was no requirement for factoring anything in the vf object.
I found it. Submitting a PR shortly.
Hi @benjamin-chan thank you for working with me on this. Can you try again after installing the dev branch:
remotes::install_github("kosukeimai/wru", ref = "issue_72")
I was able to reproduce your issue.
The dev version wru_1.0.0010
solved the issue. Tested on the test.csv
and my full 800K row data set. Thanks for working on this issue.
I don't have any issues with
census.geo = "county"
but when I switch tocensus.geo = "tract"
I get a data access failure error.returns
head()
ofdf
: