slu-openGIS / postmastr

R package for Processing and Parsing Untidy Street Addresses
https://slu-opengis.github.io/postmastr/
GNU General Public License v3.0
37 stars 8 forks source link

Dealing with Country Abbreviations at End of Address #5

Open chris-prener opened 5 years ago

chris-prener commented 5 years ago

Per @alankjackson - Google's geocoder returns data with USA affixed to the end of each address. We don't currently have functionality to remove countries, so as a workaround, the following code could be used to modify the sushi1 data:

# dependencies
library(postmastr)
library(dplyr)
library(stringr)

# add USA to the address data ala Google geocoder results
sushi <- mutate(sushi1, address = str_c(address, "USA", sep = " "))

# remove USA
sushi <- sushi %>%
  mutate(address = str_replace(string = address, 
                               pattern = "\\bUSA\\b$",
                               replacement = "")) %>%
  mutate(address = str_trim(address))

# create dictionaries
mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us")
cities <- pm_append(type = "city",
                    input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood", 
                              "St. Louis", "SAINT LOUIS", "Webster Groves"),
                    output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA))

# parse
sushi %>%
  filter(name != "Drunken Fish - Ballpark Village") %>%
  pm_parse(input = "full", address = address, output = "short", 
           keep_parsed = "no", city_dict = cities, state_dict = mo)

We'll have to figure out a long-term solution for these types of edits. pm_mutate doesn't make sense since its designed to edit observation by observation.

chris-prener commented 5 years ago

@alankjackson - we've added country functionality but it isn't on the master branch yet. There is the same functionality for countries as other address elements. You'll want to deal with them before you move on to postal codes and down through the order of operations. You can either parse with pm_country_parse() or just remove the USA from your addresses with pm_country_trim().

Do you want to test it with your data? If so, this will get you access to the data:

remotes::install_github("slu-openGIS/postmastr", ref = "countries")

Here is a minimal reprex for how this works:

library(postmastr)

postmastr::sushi1 %>%
  dplyr::filter(name != "Drunken Fish - Ballpark Village") %>%
  dplyr::mutate(address = stringr::str_c(address, "USA", sep = " ")) %>%
  pm_identify(var = address) %>%
  pm_prep(var = "address") %>%
  pm_country_trim()
alankjackson commented 5 years ago

Thanks. Out of town right now but I'll try it when I get back

On Thu, Mar 21, 2019, 12:02 PM Christopher Prener notifications@github.com wrote:

@alankjackson https://github.com/alankjackson - we've added country functionality but it isn't on the master branch yet. There is the same functionality for countries as other address elements. You'll want to deal with them before you move on to postal codes and down through the order of operations. You can either parse with pm_country_parse() or just remove the USA from your addresses with pm_country_trim().

Do you want to test it with your data? If so, this will get you access to the data:

remotes::install_github("slu-openGIS/postmastr", ref = "countries")

Here is a minimal reprex for how this works:

library(postmastr) postmastr::sushi1 %>% dplyr::filter(name != "Drunken Fish - Ballpark Village") %>% dplyr::mutate(address = stringr::str_c(address, "USA", sep = " ")) %>% pm_identify(var = address) %>% pm_prep(var = "address") %>% pm_country_trim()

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/slu-openGIS/postmastr/issues/5#issuecomment-475317031, or mute the thread https://github.com/notifications/unsubscribe-auth/AKyFBHa-dz6vuQxaXlIjQI5qdUMVw9s9ks5vY7sOgaJpZM4b1HW3 .

chris-prener commented 5 years ago

no worries! Thanks @alankjackson, and safe travels.