Closed zkamvar closed 5 years ago
The table is from the ICU Project Guide: http://userguide.icu-project.org/transforms/general, which I got from the stringi manual: http://www.gagolewski.com/software/stringi/manual/?manpage=stri_trans_general
In fact, it can be in one command since the transliterators can be combined:
Note that transliterators are often combined in sequence to achieve a desired transformation. This is analogous to the composition of mathematical functions. For example, given a script that converts lowercase ASCII characters from Latin script to Katakana script, it is convenient to first (1) separate input base characters and accents, and then (2) convert uppercase to lowercase. To achieve this, a compound transform can be specified as follows: NFKD; Lower; Latin-Katakana;
print(y <- stringi::stri_trans_general(x$Source, "ANY-Latin; Latin-ASCII"))
#> [1] "gim, gugsam" "gim, myeonghui"
#> [3] "jeong, byeongho" "..."
#> [5] "takeda, masayuki" "masuda, yoshihiko"
#> [7] "yamamoto, noboru" "..."
#> [9] "Routse, Anna" "Kaloudes, Chrestos"
#> [11] "Theodoratou, Elene > Ezra"
I feared this would come and bite us at some point. Thanks for finding the monster's lair and slaying it. This is awesome. :) :)
To tie this in with #12, we could add the de-ASCII in there before Latin-ASCII
Yup
Currently, clean_labels doesn't handle non-latin characters:
The reason for this is because the parser in
clean_labels()
transliterates any text with latin characters to ASCII, but ignores the non-latin symbols.The solution to this is to first transliterate all symbols into Latin and then transliterate that into ASCII.
Created on 2019-05-02 by the reprex package (v0.2.1)