runapp-aus / strayr

A catalogue of ready-to-use ABS coding structures. Package documentation can be found here: https://runapp-aus.github.io/strayr/
45 stars 14 forks source link

Unknown becomes NSW and Auckland becomes Aus #111

Closed surgissant closed 7 months ago

surgissant commented 7 months ago

image

Had to turn off fuzzy matching :(

MattCowgill commented 7 months ago

Hmm that's pretty bad @surgissant

I think maybe I should turn down the threshold for a fuzzy match (ie. be a bit stricter)

MattCowgill commented 7 months ago

Turning down max_dist (the threshold for fuzzy matching) does the trick here

library(tidyverse)
tibble(State = c("VIC",
                 "Unknown",
                 "WA",
                 "NSW",
                 "SA",
                 "QLD",
                 "ACT",
                 "TAS",
                 "Hong Kong",
                 "Auckland")) |> 
  mutate(State_ = strayr::clean_state(State,
                                      max_dist = 0.2))
#> # A tibble: 10 × 2
#>    State     State_
#>    <chr>     <chr> 
#>  1 VIC       Vic   
#>  2 Unknown   <NA>  
#>  3 WA        WA    
#>  4 NSW       NSW   
#>  5 SA        SA    
#>  6 QLD       Qld   
#>  7 ACT       ACT   
#>  8 TAS       Tas   
#>  9 Hong Kong <NA>  
#> 10 Auckland  <NA>

Created on 2024-01-31 with reprex v2.0.2

I'll reduce the default max_dist to 0.2 from 0.4

surgissant commented 7 months ago

Legend. Thankyou, I should have read the docs!