Open aaronmams opened 2 years ago
Describe the bug pm_streetSuf_parse() does not identify many street suffixes as illustrated in the vignette.
pm_streetSuf_parse()
I suspect this failure is possibly related to the current inability of the package to identify unit numbers.
Specific example: the pm_streetSuf_parse() method does not identify the street suffix "Drive" in the address, "310 Westline Drive, APT. 201B"
Expected Behavior I guess I expected the street suffixes "DRVIE", "HWY", "RD", and "ROAD" from the example below to be identified and parsed.
I have verified that these string values are present in the street Suffix dictionary.
To Reproduce
library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(postmastr) library(stringr) eidl_addresses <- c("98-199 KAMEHAMEHA HWY. E1","8928 S. LACLEDE STATION RD.","9785 MACKENZIE ROAD, SUITE 100","29805 MARLIS ST", "310 WESTLINE DRIVE, APT. 201B") addresses <- data.frame(eidl_addresses) addresses <- addresses %>% pm_identify(var = eidl_addresses) addresses <- addresses %>% pm_prep(var=eidl_addresses,type="short") addresses <- addresses %>% pm_house_parse() addresses #> # A tibble: 5 x 3 #> pm.uid pm.address pm.house #> <int> <chr> <chr> #> 1 1 KAMEHAMEHA HWY. E1 98-199 #> 2 2 S. LACLEDE STATION RD. 8928 #> 3 3 MACKENZIE ROAD SUITE 100 9785 #> 4 4 MARLIS ST 29805 #> 5 5 WESTLINE DRIVE APT. 201B 310 addresses <- addresses %>% pm_streetDir_parse() addresses #> # A tibble: 5 x 4 #> pm.uid pm.address pm.house pm.preDir #> <int> <chr> <chr> <chr> #> 1 1 KAMEHAMEHA HWY. E1 98-199 <NA> #> 2 2 LACLEDE STATION RD. 8928 S #> 3 3 MACKENZIE ROAD SUITE 100 9785 <NA> #> 4 4 MARLIS ST 29805 <NA> #> 5 5 WESTLINE DRIVE APT. 201B 310 <NA> addresses <- addresses %>% pm_streetSuf_parse() addresses #> # A tibble: 5 x 5 #> pm.uid pm.address pm.house pm.preDir pm.streetSuf #> <int> <chr> <chr> <chr> <chr> #> 1 1 KAMEHAMEHA HWY. E1 98-199 <NA> <NA> #> 2 2 LACLEDE STATION RD. 8928 S <NA> #> 3 3 MACKENZIE ROAD SUITE 100 9785 <NA> <NA> #> 4 4 MARLIS 29805 <NA> St #> 5 5 WESTLINE DRIVE APT. 201B 310 <NA> <NA> addresses <- addresses %>% pm_street_parse() #> Error in get(genname, envir = envir) : object 'testthat_print' not found addresses #> # A tibble: 5 x 5 #> pm.uid pm.house pm.preDir pm.street pm.streetSuf #> <int> <chr> <chr> <chr> <chr> #> 1 1 98-199 <NA> Kamehameha Hwy E1 <NA> #> 2 2 8928 S Laclede Station Rd <NA> #> 3 3 9785 <NA> Mackenzie Road Suite 100 <NA> #> 4 4 29805 <NA> Marlis St #> 5 5 310 <NA> Westline Drive Apt 201b <NA> pm_dictionary(type="suffix")[str_detect("HWY",pm_dictionary(type="suffix")$suf.input),] #> # A tibble: 2 x 3 #> suf.type suf.input suf.output #> <chr> <chr> <chr> #> 1 Highway HWY Hwy #> 2 Way WY Way
Note that if the Apartment Number is removed from the 5th entry, the street suffix is identified:
library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(postmastr) library(stringr) eidl_addresses <- c("98-199 KAMEHAMEHA HWY. E1","8928 S. LACLEDE STATION RD.","9785 MACKENZIE ROAD, SUITE 100","29805 MARLIS ST", "310 WESTLINE DRIVE") addresses <- data.frame(eidl_addresses) addresses <- addresses %>% pm_identify(var = eidl_addresses) addresses <- addresses %>% pm_prep(var=eidl_addresses,type="short") addresses <- addresses %>% pm_house_parse() addresses #> # A tibble: 5 x 3 #> pm.uid pm.address pm.house #> <int> <chr> <chr> #> 1 1 KAMEHAMEHA HWY. E1 98-199 #> 2 2 S. LACLEDE STATION RD. 8928 #> 3 3 MACKENZIE ROAD SUITE 100 9785 #> 4 4 MARLIS ST 29805 #> 5 5 WESTLINE DRIVE 310 addresses <- addresses %>% pm_streetDir_parse() addresses #> # A tibble: 5 x 4 #> pm.uid pm.address pm.house pm.preDir #> <int> <chr> <chr> <chr> #> 1 1 KAMEHAMEHA HWY. E1 98-199 <NA> #> 2 2 LACLEDE STATION RD. 8928 S #> 3 3 MACKENZIE ROAD SUITE 100 9785 <NA> #> 4 4 MARLIS ST 29805 <NA> #> 5 5 WESTLINE DRIVE 310 <NA> addresses <- addresses %>% pm_streetSuf_parse() addresses #> # A tibble: 5 x 5 #> pm.uid pm.address pm.house pm.preDir pm.streetSuf #> <int> <chr> <chr> <chr> <chr> #> 1 1 KAMEHAMEHA HWY. E1 98-199 <NA> <NA> #> 2 2 LACLEDE STATION RD. 8928 S <NA> #> 3 3 MACKENZIE ROAD SUITE 100 9785 <NA> <NA> #> 4 4 MARLIS 29805 <NA> St #> 5 5 WESTLINE 310 <NA> Dr
Desktop (please complete the following information):
sessionInfo() #> R version 4.0.2 (2020-06-22) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 10 x64 (build 19043) #> #> Matrix products: default #> #> locale: #> [1] LC_COLLATE=English_United States.1252 #> [2] LC_CTYPE=English_United States.1252 #> [3] LC_MONETARY=English_United States.1252 #> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United States.1252 #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> loaded via a namespace (and not attached): #> [1] compiler_4.0.2 magrittr_2.0.1 tools_4.0.2 htmltools_0.5.1.1 #> [5] yaml_2.2.1 stringi_1.5.3 rmarkdown_2.4 highr_0.8 #> [9] knitr_1.30 stringr_1.4.0 xfun_0.18 digest_0.6.27 #> [13] rlang_0.4.10 evaluate_0.14
Thanks for reaching out - I can confirm that the incomplete units workflow would address this. Unfortunately, I don't have a development timeline - this project got back-burnered due to the pandemic.
Describe the bug
pm_streetSuf_parse()
does not identify many street suffixes as illustrated in the vignette.I suspect this failure is possibly related to the current inability of the package to identify unit numbers.
Specific example: the
pm_streetSuf_parse()
method does not identify the street suffix "Drive" in the address, "310 Westline Drive, APT. 201B"Expected Behavior I guess I expected the street suffixes "DRVIE", "HWY", "RD", and "ROAD" from the example below to be identified and parsed.
I have verified that these string values are present in the street Suffix dictionary.
To Reproduce
Note that if the Apartment Number is removed from the 5th entry, the street suffix is identified:
Desktop (please complete the following information):