kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
130 stars 30 forks source link

Error in predict_race #56

Closed kkprinceton closed 2 years ago

kkprinceton commented 2 years ago

I get the following error when trying to run predict_race() using a pre-saved Census object. I'm sure this can be fixed easily and was likely my fault when I wrote the original code!

Error in if ((toDownload) || (is.null(census.data[[state]])) || (census.data[[state]]$age !=  : 
  missing value where TRUE/FALSE needed
kkprinceton commented 2 years ago

Not 100% sure, but issue could be that my Census object doesn't have everything the function expects, e.g., age and sex.

1beb commented 2 years ago

Hey Kabir! Could you show:

 str(census.dat, max.levels = 2)

and:

The call that you used (predict_race(...))?

kkprinceton commented 2 years ago
> str(CensusObj, max.levels = 2)
List of 1
 $ AL:List of 4
  ..$ state : chr "AL"
  ..$ county:'data.frame':  67 obs. of  16 variables:
  .. ..$ state  : chr [1:67] "AL" "AL" "AL" "AL" ...
  .. ..$ county : chr [1:67] "039" "045" "067" "051" ...
  .. ..$ P4_001N: num [1:67] 29387 38048 13641 69005 81121 ...
  .. ..$ P4_002N: num [1:67] 414 2101 215 1786 2916 ...
  .. ..$ P4_005N: num [1:67] 24420 26220 9523 50273 62803 ...
  .. ..$ P4_006N: num [1:67] 3458 7429 3421 13972 11399 ...
  .. ..$ P4_007N: num [1:67] 122 186 39 212 287 ...
  .. ..$ P4_008N: num [1:67] 185 561 52 528 716 ...
  .. ..$ P4_009N: num [1:67] 0 37 0 13 23 7 210 25 49 316 ...
  .. ..$ P4_010N: num [1:67] 43 104 31 163 152 ...
  .. ..$ P4_011N: num [1:67] 745 1410 360 2058 2825 ...
  .. ..$ r_whi  : num [1:67] 0.00952 0.01022 0.00371 0.0196 0.02449 ...
  .. ..$ r_bla  : num [1:67] 0.00354 0.00761 0.0035 0.0143 0.01167 ...
  .. ..$ r_his  : num [1:67] 0.00248 0.01259 0.00129 0.0107 0.01748 ...
  .. ..$ r_asi  : num [1:67] 0.00299 0.00967 0.00084 0.00874 0.01194 ...
  .. ..$ r_oth  : num [1:67] 0.00618 0.01155 0.00292 0.01653 0.02218 ...
  ..$ tract :'data.frame':  1437 obs. of  17 variables:
  .. ..$ state  : chr [1:1437] "AL" "AL" "AL" "AL" ...
  .. ..$ county : chr [1:1437] "001" "001" "001" "001" ...
  .. ..$ tract  : chr [1:1437] "020100" "020200" "020300" "020400" ...
  .. ..$ P4_001N: num [1:1437] 1370 1584 2485 3344 3369 ...
  .. ..$ P4_002N: num [1:1437] 62 34 60 100 100 106 93 170 74 45 ...
  .. ..$ P4_005N: num [1:1437] 1093 662 1779 2835 2636 ...
  .. ..$ P4_006N: num [1:1437] 147 834 537 237 416 614 488 546 538 226 ...
  .. ..$ P4_007N: num [1:1437] 3 2 7 14 15 9 2 5 4 10 ...
  .. ..$ P4_008N: num [1:1437] 2 12 11 22 65 146 82 9 15 5 ...
  .. ..$ P4_009N: num [1:1437] 0 0 4 1 2 3 2 1 2 0 ...
  .. ..$ P4_010N: num [1:1437] 5 5 3 5 5 25 10 6 11 0 ...
  .. ..$ P4_011N: num [1:1437] 58 35 84 130 130 73 102 85 106 74 ...
  .. ..$ r_whi  : num [1:1437] 0.0337 0.0204 0.0549 0.0875 0.0813 ...
  .. ..$ r_bla  : num [1:1437] 0.0177 0.1003 0.0646 0.0285 0.05 ...
  .. ..$ r_his  : num [1:1437] 0.0463 0.0254 0.0448 0.0747 0.0747 ...
  .. ..$ r_asi  : num [1:1437] 0.00308 0.01846 0.02308 0.03538 0.10308 ...
  .. ..$ r_oth  : num [1:1437] 0.0365 0.0232 0.0519 0.0823 0.0829 ...
  ..$ block :'data.frame':  185976 obs. of  18 variables:
  .. ..$ state  : chr [1:185976] "AL" "AL" "AL" "AL" ...
  .. ..$ county : chr [1:185976] "001" "001" "001" "001" ...
  .. ..$ tract  : chr [1:185976] "020100" "020100" "020100" "020100" ...
  .. ..$ block  : chr [1:185976] "1011" "1014" "1017" "1021" ...
  .. ..$ P4_001N: num [1:185976] 0 78 42 102 10 4 22 15 39 2 ...
  .. ..$ P4_002N: num [1:185976] 0 3 1 5 2 2 2 0 3 0 ...
  .. ..$ P4_005N: num [1:185976] 0 69 39 80 6 2 14 13 33 2 ...
  .. ..$ P4_006N: num [1:185976] 0 5 0 10 0 0 3 0 3 0 ...
  .. ..$ P4_007N: num [1:185976] 0 0 0 0 0 0 0 1 0 0 ...
  .. ..$ P4_008N: num [1:185976] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ P4_009N: num [1:185976] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ P4_010N: num [1:185976] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ P4_011N: num [1:185976] 0 1 2 7 2 0 3 1 0 0 ...
  .. ..$ r_whi  : num [1:185976] 0 0.06313 0.03568 0.07319 0.00549 ...
  .. ..$ r_bla  : num [1:185976] 0 0.034 0 0.068 0 ...
  .. ..$ r_his  : num [1:185976] 0 0.0484 0.0161 0.0806 0.0323 ...
  .. ..$ r_asi  : num [1:185976] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ r_oth  : num [1:185976] 0 0.0152 0.0303 0.1061 0.0303 ...
predict_race(voter.file = df[c("LALVOTERID", "surname", "state", "county", "tract", "block")], 
                    census.geo = "tract", census.data = CensusObj)
1beb commented 2 years ago

I think all you need to do is set age and sex attributes in censusObj[["AL"]] to FALSE.

censusObj$AL$age <- FALSE
censusObj$AL$sex <- FALSE

Then your data ought to fall in line with the readme (see the second last code block for an example) and you should be able to run. I can't comment if you have all the right census variables because I don't know them off-hand!

kkprinceton commented 2 years ago

Thanks @1beb, that's exactly what I did. Maybe this constraint on the Census object should be relaxed in future versions, because not everyone will care about incorporate these demographics.

1beb commented 2 years ago

The new version (on hwru branch) doesn't currently support sex or gender, so the check is not performed unless you explicitly use the old version of the function.