mybb / merge-system

The MyBB Merge System allows for easy merging of an existing forum (be it MyBB or another forum software) into a MyBB 1.8.x forum.
Other
34 stars 33 forks source link

UTF-8 issues regarding UTF-8 conversion and others #226

Open yuliu opened 5 years ago

yuliu commented 5 years ago

Preface. I've got a Chinese user based Discuz! forum and have had written a converter for it. During working with the users module, I found problems of the Merge System with converting usernames correctly. By looking into the encode_to_utf8() function in ./merge/resources/functions.php and check_for_duplicates() in ./merge/resources/modules/users, I found something may be causing the problem.

BTW, I've written a small script to visually show the problem. It simulates the using of some functions in the Merge System and MyBB, and assumes you want conversion to UTF-8, and mb_* & iconv functions exist. You should save the file in ANSI encoding in a text editor, but not UTF-8, to have it work. Oh, I'm running PHP 5.5 right now.

Let's go on:

However, I still can't understand the using of different string lower functions on a username, via: my_strtolower($duplicate_user['username']) == strtolower($encoded_username)

Maybe, this issue is caused by it, too. I've written my versions of these affected functions in my own converter's and module's class, without modifying the basic Merge System. I may make a pull if we come to a conclusion it's wrong usage of mb_*.

yuliu commented 5 years ago

I've put some encodings across MySQL charset, iconv encoding and mbstring encoding together here, incomplete list.

Taking Chinese character encodings for example, their namings and support status by corresponding softwares are different. Maybe we can move some encoding judgement into the board converter's class, or users have to write their own encode_to_utf8/strlen/strtolower functions, when the encoding went wrong.

Well, it's understandable that the Merge System cannot handle languages well other than English. But it's still a great converter system, and that's why I choose MyBB rather than phpBB/smf/... .