kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
130 stars 31 forks source link

probability estimates conditional on other rows in dataset #68

Closed bthowe closed 2 years ago

bthowe commented 2 years ago

These all yield different estimates for the same surname, which is contrary to my expectation.

library(wru) future.callr::callr surname <- c("SULLIVAN") predict_race(voter.file=data.frame(surname), surname.only=TRUE)

surname <- c("SULLIVAN", "SULLIVAN") predict_race(voter.file=data.frame(surname), surname.only=TRUE)

surname <- c("SULLIVAN", "SULLIVAN", "SULLIVAN") predict_race(voter.file=data.frame(surname), surname.only=TRUE)

1beb commented 2 years ago

@solivella will need to look at this more

BISG isn't designed to take a very small sample of the voterfile for surname.only = TRUE because it tries to use a fixed race marginal to distribute predictions. If you have an exceedingly small voter file sample, you may want to feed them individually (row by row) to predict_race.

In your code, for surname only with no census data you don't need to specify a parallel plan. The correct form would be something like:

library(wru)
library(future)
plan(future.callr::callr)
...

Note @solivella we should probably add a warning when we see a small voter.file sample or a race-biased sample of last names with surname only.

kkprinceton commented 2 years ago

FWIW I tried this code and couldn't replicate the problem

bthowe commented 2 years ago

Ok. What version of R and wru are you using? I'll try again in another environment. I first encountered screwy results using a dataset of 50k+ observations, so I don't think it's a 'very small' sample issue.

On Mon, Jun 27, 2022 at 12:34 PM Kabir @.***> wrote:

FWIW I tried this code and couldn't replicate the problem

— Reply to this email directly, view it on GitHub https://github.com/kosukeimai/wru/issues/68#issuecomment-1167647292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE57FUIIUPECXZ3RDD75LCLVRHQ3PANCNFSM5ZYOSL2A . You are receiving this because you authored the thread.Message ID: @.***>

1beb commented 2 years ago

@bthowe This has been fixed in the most recent release available on github. It will be a bit before this gets pushed to CRAN, so you can install the development version from github using the remotes package:

remotes::install_github("kosukeimai/wru")

Thank you @solivella for the fix!