runapp-aus / strayr

A catalogue of ready-to-use ABS coding structures. Package documentation can be found here: https://runapp-aus.github.io/strayr/
45 stars 14 forks source link

Edge cases not captured by `clean_anzsco` #84

Open MattCowgill opened 2 years ago

MattCowgill commented 2 years ago

Even with fuzzy_match = TRUE:

strayr::clean_anzsco("Middle School Teacher \\ Intermediate School Teacher" , fuzzy_match = T) ==
  strayr::clean_anzsco("Middle School Teachers \\ Intermediate School Teachers" , fuzzy_match = T)
#> [1] FALSE

Created on 2022-08-05 by the reprex package (v2.0.1)

daviddiviny commented 2 years ago

At a minimum, we should include: image

daviddiviny commented 2 years ago

So @MattCowgill - I was just looking into your reprex in more detail. The issue is that the ANZSCO Unit Group is the plural version "Middle School Teachers \ Intermediate School Teachers" and the ANZSCO occupation is the singular version "Middle School Teacher \ Intermediate School Teacher".

Any ideas on what the proper behaviour of 'clean_anzsco' should be?

MattCowgill commented 2 years ago

Argh how annoying. Not sure tbh, let's think about it...

daviddiviny commented 2 years ago

Also, just putting this here for reference. The ATO has their own custom version of ANZSCO6 with additional occupations. See here.