tchwork / utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP
Apache License 2.0
627 stars 50 forks source link

Normalization side effects #9

Closed peaceman closed 10 years ago

peaceman commented 10 years ago

Based on #8

Assuming the locale is set to de_DE.UTF-8

If you normalize ä (U+0000C3A4) with NFKD the preg_replace call in Utf8::toAscii will strip the accents and the result is a (U+00000061). When normalizing with NFKC, you will get the expected result ae (U+00000061 and U+00000065).

nicolas-grekas commented 10 years ago

Could you please try master?

peaceman commented 10 years ago

Works only with the glibc implementation of iconv, but that is another problem.

Thx for the fix.

nicolas-grekas commented 10 years ago

Works only with the glibc implementation of iconv, but that is another problem.

Right! But does any other implementation work the same way as glibc? I don't know about any...

nicolas-grekas commented 10 years ago

Fixed and tested in master.