tchwork / utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP
Apache License 2.0
627 stars 50 forks source link

Allows the choice of the used unicode normalization method in the Patchwork\Utf8::toAscii method #8

Closed peaceman closed 10 years ago

nicolas-grekas commented 10 years ago

Thanks for your interest in Patchwork UTF-8 ! Why would this be useful? NFC or NFKC are not possible because e.g. they would fail to convert accents, who themselves are not ASCII chars, Then only NFD remains, but again, NFKD has more characters in its table, and the result would only be more conversion failures. Don't you think?

peaceman commented 10 years ago

The initial intention for this pull request has its roots at a transliteration problem with german umlauts like äӧü. For example if you normalize ä (U+0000C3A4) with NFKD the preg_replace call will strip the accents and the result is a ( U+00000061). When normalizing with NFKC, you will get the expected result ae (U+00000061 and U+00000065).

nicolas-grekas commented 10 years ago

Hum, this is a side effect of the implementation: By using NFKC, "ä" stays as "ä" upon normalization, then is converted to "ae" by iconv. This works because your locale on your server must be set to de_DE.UTF8. I can't merge your patch, but I will think about the problem, trying to make use of the current locale when possible. Could you open an other issue with your last message copy pasted inside please?

peaceman commented 10 years ago

Sure, the locale has to be explicitly set to de_DE.UTF-8 for iconv to convert ä to ae (this also isn't an issue and is expected), but iconv won't transliterate the decomposed equivalent.