tchwork / utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP
Apache License 2.0
627 stars 50 forks source link

Using PatchWork - UTF8 for a new framework #11

Closed CMCDragonkai closed 10 years ago

CMCDragonkai commented 10 years ago

I'm developing a new framework and was researching on providing portable and strict UTF-8 functionality.

My question is that on the Usage documentation, it says just use the Bootup class and run those three functions on startup. Is that all I have to do? There's nothing else that needs to be configured? I noticed that you provided shims for various mb_string in case the extension wasn't available.

Furthermore regarding the set_locale. If I wanted to create a Chinese website, would I then run set_locale again but with a Chinese locale? How does the set_locale function work in the context of Patchwork-UTF8?

nicolas-grekas commented 10 years ago

Hi,

Yes, those 3 functions are all that is required to ensure 1. PHP is properly configured, 2. your input is normalized and 3. the portability fallbacks are enabled if required.

After that, it's your job to use correctly the mbstring/iconv/grapheme/patchwork functions of course. But at least you know you can rely on them.

Concerning the locale, it's used at only two code locations : first to ensure that basename() / pathinfo() work correctly, an UTF-8 locale should be choosen. Which one has no importance here, zh_CN.UTF-8 should be fine for you if I understand.

The other location is in the Patchwork\Utf8::toAscii() transliteration function. But I don't expect this function to work well with a Chinese locale. Tell me what you get...

Nothing more :)

CMCDragonkai commented 10 years ago

Thanks for the information. Just to clarify, regarding the set_locale:

\Patchwork\Utf8\Bootup::initAll(); // Enables the portablity layer and configures PHP for UTF-8
\Patchwork\Utf8\Bootup::filterRequestUri(); // Redirects to an UTF-8 encoded URL if it's not already the case
\Patchwork\Utf8\Bootup::filterRequestInputs(); // Normalizes HTTP inputs to UTF-8 NFC

set_locale(LC_ALL, 'zh_CN');

Would the above code be valid? I am just confused as to how my code using the set_locale would affect the set_locale inside your library. Do they overwrite or do they meld and thus what are the effects?

nicolas-grekas commented 10 years ago

If you can set_locale(LC_ALL, 'zh_CN.UTF-8'); then I'd say it would be more consistent, ie you'll have a 100% UTF-8 context. But as far as patchwork/utf8 is concerned, only the Patchwork\Utf8::toAscii() function does rely on the current locale. Don't use it and you'll be safe. Or configure an UTF-8 locale.