Open edipoferreira opened 6 years ago
That’s an interesting one. The duplicate check is done in the code I believe, and it perhaps doesn’t do that in a way that understand what’s going on. We’ll have to take a look, though the merge system at the minute is mostly unmaintained.
On 19 Jun 2018, at 23:59, lordgittux notifications@github.com wrote:
Hi, I'm trying to use the merge but when importin users I have issue with names with special characters, let me show: On the phpbb I have two users jonatas and Jônatas, the encode is utf8_bin, but when mybb try to import, he considere jonatas and Jônatas the same user and because that issue a message of duplicated entry.
Have anyone faced this problem? Tried a couple of configurations for encode on the merge but nothing worked. To add more information, I changed the collation of the field username on mybb_users to utf8_bin, someone know if there is some type of problem if I let the field remain as this?
ALTER TABLE mybb_users CHANGE username username VARCHAR(120) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';
Appears that I will be unable to migrate from phpbb3 due this encoding issue.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
I think that is a issue with the used collation, I installed a fresh version of the mybb, I tried create both users to test, no sucess, because the collation it compares Jônatas with jonatas and take them as equal. After my conversion to utf8_bin the second user was created normally. I opened a issue on the mybb code to see if I can maintain the username field as utf8-bin. https://github.com/mybb/mybb/issues/3267
I think this issue does not relate to SQL's collation but the character set. The mechanism of user duplicate checking is coded in the users
base module with the consideration of UTF-8. However, there's more in database collation perspective.
For @lordgittux 's problem, Jônatas
is indeed duplicate of jonatas
by the logic of code in the base module, in case-insensitive collations:
j
is regarded the same as J
ô
is regarded the same as o
@euantorano, yep here's the interesting point, looks like the duplicate check in base users
module will not cover scenarios of Circumflex diacritical mark (ˆ) or letter variations. I'll dig more later.
I opened #226, in which there's some discussion of the UTF-8 problem would potentially relate to the user duplicate check.
Edited: typo. Edited: more investigation.
Hi, I'm trying to use the merge but when importin users I have issue with names with special characters, let me show: On the phpbb I have two users jonatas and Jônatas, the encode is utf8_bin, but when mybb try to import, he considere jonatas and Jônatas the same user and because that issue a message of duplicated entry.
Have anyone faced this problem? Tried a couple of configurations for encode on the merge but nothing worked. To add more information, I changed the collation of the field username on mybb_users to utf8_bin, someone know if there is some type of problem if I let the field remain as this?
ALTER TABLE
mybb_users
CHANGEusername
username
VARCHAR(120) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '';Appears that I will be unable to migrate from phpbb3 due this encoding issue.