Closed twalen closed 4 years ago
Thanks for a nice contribution. If I got it right, this PR (among other things), overcomes deficiencies in out-of-the-box unicode normalization by handling troublesome characters manually?
Regarding removing dots, commas and dashes from terms, why is that necessary? Are we removing those from names but not from terms? Or...?
Yes, this handles some characters manually (according to the map in NON_NFKD_MAP).
Dots, commas, etc:
Seems good to me.