slu-openGIS / postmastr

R package for Processing and Parsing Untidy Street Addresses
https://slu-opengis.github.io/postmastr/
GNU General Public License v3.0
37 stars 8 forks source link

Street names like "three pines" get turned into "3Rd Pines" when normalized #12

Closed alankjackson closed 4 years ago

alankjackson commented 5 years ago

It seems like the street dictionary should handle this, and maybe I am not using it correctly, but it seems to make no difference.

reprexdata <- tribble(
  ~address,
  "5330 THREE OAKS CIR, HOUSTON, TX, 77069",
  "3240 THREE PINES DR, HUMBLE, TX, 77339"
)
TX_dict <- pm_dictionary(type = "state", filter = "TX", 
                         case = "title", locale = "us")
cityDict <- pm_append(type = "city",
                      input = 
                        c("Houston", "Katy", "Pasadena", "Bellaire", 
                          "Humble", "Meadows Place", "Sugar Land",
                          "HOUSTON", "KATY", "PASADENA", "BELLAIRE",
                          "HUMBLE", "MEADOWS PLACE", "SUGAR LAND"
                          ))

dirs <- pm_dictionary(type = "directional", filter = c("N", "S", "E", "W"), locale = "us")

streetDict <- pm_append(type="street",
                        input=c("THREE OAKS", "THREE PINES", 
                                "FOUR PINES", "FOUR RIVERS", 
                                "FOUR WINDS", "SEVEN MAPLES", 
                                "SEVEN MILE", "SEVEN OAKS", 
                                "EIGHT WILLOWS"),
                        output=c("Three Oaks", "THREE PINES", 
                                "FOUR PINES", "FOUR RIVERS", 
                                "FOUR WINDS", "SEVEN MAPLES", 
                                "SEVEN MILE", "SEVEN OAKS", 
                                "EIGHT WILLOWS"))

dftest <- pm_identify(reprexdata, var = "address")

dftest <- pm_prep(dftest, var = "address")

dftest <- pm_postal_parse(dftest)

dftest <- pm_state_parse(dftest)

dftest <- pm_city_parse(dftest, dictionary=cityDict)

dftest <- pm_house_parse(dftest)

dftest <- pm_streetDir_parse(dftest, dictionary=dirs)

dftest <- pm_streetSuf_parse(dftest) 

dftest <- pm_street_parse(dftest, dictionary = streetDict)

dftest

1 5330 3rd Oaks Cir HOUSTON TX 77069 2 3240 3rd Pines Dr HUMBLE TX 77339

chris-prener commented 5 years ago

Ah - its assuming ordinal always. Can you add 3rd Pines to input and Third Pines to output and see if that helps?

alankjackson commented 5 years ago

Nope. Same result.

alankjackson commented 5 years ago

Addresses have soooo many corner cases!

chris-prener commented 5 years ago

OK - I'll check out the workflow and see how we can address this. Yes - lots of corner cases. The goal is to accommodate as many as possible!

chris-prener commented 4 years ago

Addressed as I fix another issue with this part of the workflow!