kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
129 stars 30 forks source link

Problem with gen_census_data when year="2020" #142

Closed kathybi closed 6 months ago

kathybi commented 6 months ago

I'm experiencing really slow (completely stalled?) progress when running the following code:

census2020 <- get_census_data(key= Sys.getenv("CENSUS_API_KEY"), state = "PA", age = FALSE, sex = FALSE, census.geo = "block")

I wonder if it's at all related to mistaken variable names here: https://github.com/kosukeimai/wru/blob/main/R/census_geo_api_names.R

Shouldn't the prefix for the 2020 variables be P9, not P2? See here: https://api.census.gov/data/2020/dec/dhc/variables.html

kathybi commented 6 months ago

Sorry, recognizing now that what I said above was not the issue. I've let the code run for longer and am receiving the following error regarding some tracts and not others?

"Error in file(con, "r"): cannot open the connection to 'https://api.census.gov/data/2020/dec/dhc?key=b73339184071bbfdb660b9a33e9bf1570ce3a227&get=P12I_001N,P12B_001N,P12H_001N,P12D_001N,P12E_001N,P12C_001N,P12F_001N,P12G_001N&for=block:*&in=state:42+county:003+tract:426300'

What is causing this? Any help would be greatly appreciated! Thanks!

1beb commented 6 months ago

It's a lot of data and it might be timing out. We don't have a lot of error handling/retries in wru for the census pulls. You may want to break up your pulls locally so that you attempt smaller sections and then build up the census file. Especially if you need a lot of data from the census api, it could be rate-limiting you.

1beb commented 6 months ago

Closing this as it appears to be related to API timeout vs. a problem with wru. Taking note of the need for retries in the future.