kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
130 stars 30 forks source link

Type error when reading census data retrieved with the get_census_data function #84

Closed hirsch-sw closed 2 years ago

hirsch-sw commented 2 years ago

Hi,

I have been able to retrieve the census data and format a voter file to run through the predict_race function. However, I keep getting Error in sum(census[paste(“P012”, eth.let[i], “001”, sep = “”)]) : invalid ‘type’ (list) of argument

I traced it back to the census_helper function, where it first appears in ## Calculate Pr(Geolocation, Sex | Race). Looking at the data as it came in via the get_census_data function, it seems that the code is indeed trying to sum() a set of lists. Have you found a fix for this? Is it possible that my data did not load properly?

I pulled block-level data for California from 2010 with sex and age using this code:

d_census <- get_census_data(states = "CA", key = mykey, sex = TRUE, age = TRUE, census.geo = "block", retry = 3)

The sublists that appear under $CA are: $state $age $sex $tract $county

To predict race, I am using: d_predicted <- predict_race(voters, census.geo = "block", census.key = mykey, census.data = d_census, age = TRUE, sex = TRUE, party = "PID", retry = 3)

I have fiddled around with the parameters a little bit by making default parameters explicit (census.surname = TRUE; year = "2010", surname.year = 2010, etc.), but it hasn't helped. Any thoughts?

1beb commented 2 years ago

Hi Sarah (congrats on your first issue!)

Newer versions of the packages (1.0+) do not yet support age and sex. Could you start by telling me what version of the package you are using (see or post sessionInfo()) and creating a reproducible example? Also, could you verify that the tracts in your data are 2010 and not 2020?

My recommendation, if you want to use age and sex and your target is 2010 census data, you may be better off using an older version of the package (https://cran.r-project.org/src/contrib/Archive/wru/)

hirsch-sw commented 2 years ago

Hi Brandon,

Thanks! It turns out that the data either hadn't loaded properly or I had accidentally loaded the wrong object. More specifically, the list I had been using was 37 MB, but when I pulled the data again, the list was 3.2 GB--so something changed in there. Either way, after downloading the data again, the problem was resolved and I was able to generate race predictions. Thank you for your response!