kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
132 stars 31 forks source link

Fix surname.only bug #73

Closed solivella closed 2 years ago

solivella commented 2 years ago

Resolve issue reported in #68.

1beb commented 2 years ago

For surname only, could you comment on why we think it's reasonable to redistribute using a fixed race marginal? Does this not defeat the intuition that race will be distributed based on the most likely race by surname when surname.only = TRUE?

solivella commented 2 years ago

For surname only, could you comment on why we think it's reasonable to redistribute using a fixed race marginal? Does this not defeat the intuition that race will be distributed based on the most likely race by surname when surname.only = TRUE?

surname.only = TRUE computes Pr(race | surname), but our current name dictionaries return Pr(surname | race). To "invert" the probability, we need to multiply Pr(surname | race) by Pr(race) (and then normalize), as per Bayes' rule.