sylae / ligrev

XMPP MUC Utility bot
GNU General Public License v3.0
2 stars 1 forks source link

Audit Unicode support #25

Closed sylae closed 7 years ago

sylae commented 8 years ago

(Blocks #4)

Although it's something I haven't dealt with too much, PHP apparently has historically poor Unicode support. We should make sure all functions fucking with strings are multibyte-safe.

sylae commented 8 years ago

string dereferencing ($string[0]) cannot be used. That sucks because that's, like, the only new thing PHP has added in the last decade that isn't something about OO.

sylae commented 8 years ago

It appears that the default_charset ini setting is to our advantage. From 5.5 (?) onwards, it is set to UTF-8, before that it's set to some garbage that isn't UTF-8. So for that, we can either drop 5.4 (?) support (which is already planned, kind of), or just check that the INI value is properly set, and if not, either shit self or set it.

It looks like anything touching a string will just need to use the mb_ version instead (installed by default from debian, so that's nice), so this shouldn't be too difficult.

sylae commented 7 years ago

so actually in my experience unicode has been great, it's fucking html entities that kill us. huh

clooooossssiiinnnggggg