mybb / merge-system

The MyBB Merge System allows for easy merging of an existing forum (be it MyBB or another forum software) into a MyBB 1.8.x forum.
Other
34 stars 33 forks source link

Merge system - issue importing from phpbb3 utf8-bin - duplicated entrie #209

Open edipoferreira opened 6 years ago

edipoferreira commented 6 years ago

Hi, I'm trying to use the merge but when importin users I have issue with names with special characters, let me show: On the phpbb I have two users jonatas and Jônatas, the encode is utf8_bin, but when mybb try to import, he considere jonatas and Jônatas the same user and because that issue a message of duplicated entry.

Have anyone faced this problem? Tried a couple of configurations for encode on the merge but nothing worked. To add more information, I changed the collation of the field username on mybb_users to utf8_bin, someone know if there is some type of problem if I let the field remain as this?

ALTER TABLE mybb_users CHANGE username username VARCHAR(120) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';

Appears that I will be unable to migrate from phpbb3 due this encoding issue.

euantorano commented 6 years ago

That’s an interesting one. The duplicate check is done in the code I believe, and it perhaps doesn’t do that in a way that understand what’s going on. We’ll have to take a look, though the merge system at the minute is mostly unmaintained.

On 19 Jun 2018, at 23:59, lordgittux notifications@github.com wrote:

Hi, I'm trying to use the merge but when importin users I have issue with names with special characters, let me show: On the phpbb I have two users jonatas and Jônatas, the encode is utf8_bin, but when mybb try to import, he considere jonatas and Jônatas the same user and because that issue a message of duplicated entry.

Have anyone faced this problem? Tried a couple of configurations for encode on the merge but nothing worked. To add more information, I changed the collation of the field username on mybb_users to utf8_bin, someone know if there is some type of problem if I let the field remain as this?

ALTER TABLE mybb_users CHANGE username username VARCHAR(120) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';

Appears that I will be unable to migrate from phpbb3 due this encoding issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

edipoferreira commented 6 years ago

I think that is a issue with the used collation, I installed a fresh version of the mybb, I tried create both users to test, no sucess, because the collation it compares Jônatas with jonatas and take them as equal. After my conversion to utf8_bin the second user was created normally. I opened a issue on the mybb code to see if I can maintain the username field as utf8-bin. https://github.com/mybb/mybb/issues/3267

yuliu commented 4 years ago

I think this issue does not relate to SQL's collation but the character set. The mechanism of user duplicate checking is coded in the users base module with the consideration of UTF-8. However, there's more in database collation perspective.

For @lordgittux 's problem, Jônatas is indeed duplicate of jonatas by the logic of code in the base module, in case-insensitive collations:

@euantorano, yep here's the interesting point, looks like the duplicate check in base users module will not cover scenarios of Circumflex diacritical mark (ˆ) or letter variations. I'll dig more later.

I opened #226, in which there's some discussion of the UTF-8 problem would potentially relate to the user duplicate check.

Edited: typo. Edited: more investigation.