Closed NightSmile96 closed 2 years ago
You can download the name files directly for analysis and directly to your dataset. Problems that we have seen in the past is that some people's data contain names with special characters or more than one name per name-part.
piggyback::pb_download("wru-data-census_first_c.rds", repo = "kosukeimai/wru")
r <- readRDS("wru-data-census_first_c.rds")
r$last_name
This has come up before so we may add an option to "save out" the list that isn't matched. 10% unmatched is pretty reasonable based on my experience with this package typical range is around 10% much more than that would be concerning.
Another alternative is to clone the repository and set a debug at the point where the message is sent, allowing you to inspect the unmatched in the environment.
I'm not sure if this is the right place to post this... so apologies in advance if so. I have been using the wru package to predict the race of campaign donors and have a dataset with the geographic information, first name, and last name. There are 14,091 observations in my dataset but when I run the race predict command, 1349 last names and 98 first names are not matched. The message I get is the following: "1349 (9.6%) individuals' last names were not matched. 98 (0.7%) individuals' first names were not matched." Is there a way to view which names are not matched? I am wanting to see if there is an issue in the name field.