Determining which names are not matched

kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation

132 stars 31 forks source link

You can download the name files directly for analysis and directly to your dataset. Problems that we have seen in the past is that some people's data contain names with special characters or more than one name per name-part.

piggyback::pb_download("wru-data-census_first_c.rds", repo = "kosukeimai/wru")
r <- readRDS("wru-data-census_first_c.rds")
r$last_name

This has come up before so we may add an option to "save out" the list that isn't matched. 10% unmatched is pretty reasonable based on my experience with this package typical range is around 10% much more than that would be concerning.

Another alternative is to clone the repository and set a debug at the point where the message is sent, allowing you to inspect the unmatched in the environment.

kosukeimai / wru

Determining which names are not matched #85