nwnxee / unified

Binaries available under the Releases tab on Github
https://nwnxee.github.io/unified
GNU General Public License v3.0
131 stars 92 forks source link

Regarding Cyrillic support in the plugin SQL #300

Closed kelewind87 closed 5 years ago

kelewind87 commented 5 years ago

good evening The question of such a plan - at the moment in the SQL plugin is implemented work with encodings?

The following situation 1) there is a database in cp1251 encoding, for example 2) collation settins connection and other database settings are set up in such a way that even when called from nwscript command like

SELECT @@collation_connection;

I see the normal encoding However, when requesting text data from the database

???

repeats, the base settings have been tried different, and different encodings - the result is the same. When playing through scripts:

???

However, if you directly type in the game in the script "Мама мыла раму" - then this type of text will be displayed

ELadner commented 5 years ago

NWN was created before UTF-8 was popular (prior to 2002). The sql plugin and the database may store whatever you throw at it, but I'm not sure how nwscript actually inteprets the text it gets out of the db.

Do you have a code snippet you can share?

On Wed, Jan 23, 2019 at 3:58 PM kelewind87 notifications@github.com wrote:

good evening The question of such a plan - at the moment in the SQL plugin is implemented work with encodings?

The following situation

  1. there is a database in cp1251 encoding, for example
  2. collation settins connection and other database settings are set up in such a way that even when called from nwscript command like

SELECT @@collation_connection;

I see the normal encoding However, when requesting text data from the database

???

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nwnxee/unified/issues/300, or mute the thread https://github.com/notifications/unsubscribe-auth/ADgtqkaul4rgmhs1AwNUpAg9IKhbdT_8ks5vGNr-gaJpZM4aPti3 .

-- Eric Ladner

mtijanic commented 5 years ago

NWN localization support is weird. Internally, NWN only knows one encoding - windows cp-1252 (a slight extension to ISO8859-1). It looks like this:

So, NWN does not support Cyrillic. Instead, they override the Latin font to replace the glyphs for e.g. Š with e.g. л. So as far as any computer logic is concerned, the text is still Latin, but when it is displayed to the user in the game, the custom font makes it appear as Cyrillic. Hope I'm making sense so far.

Now, all input boxes in NWN are actually UTF-8, meaning you can enter any character in them (but since the font is missing the ligatures for it, it might show up weird). But when you do, these are converted to cp-1252, just like nwnx does e.g. here. Except, for Cyrillic/Russian in particular, NWN has a small quality of life improvement during that conversion. It will automatically detect if you typed Cyrillic characters in unicode, and perform the remapping to the cp-1252 according to the custom font. That mapping looks like this:

if (codepoint > 1024 && codepoint < 1104)
{
    if (codepoint == 1025)
    {
        codepoint = 1016;
    }
    else if (codepoint == 1105)
    {
        codepoint = 1032;
    }
    codepoint -= 848;
}

This way, you can set your PC's language to be Cyrillic, type as you'd normally, and the game will internally transfer that to latin gibberish for storage/manipulation.

The issue with SQL, Webhook or anything else is that it gets the latin gibberish from NWN because it doesn't know about NWN's special mapping. So, for SQL in particular, you can store data in any encoding (cp-1252, utf-8, etc). When you type "Мама мыла раму" in the game and store that in the DB, you'll get something like "öćšćčššžĆČŠPžš" or whatever, because that is what it is. But then, when you send it back (e.g. with SendMessageToPC()) to a player that has the Russian font installed, they'll see the proper message. But if you send it to a player with the Latin font, they'll see the gibberish. If, on the other hand, you write Cyrillic to database in another way (e.g. load data from a file), that data will not be understood by the game, because it does not follow NWN mapping.

I hope this makes sense. I'll end this comment here, so I have where to elaborate on how it works if there are questions.

kelewind87 commented 5 years ago
to a player that has the Russian font installed, they'll see the proper message. But if you send it to a player with the Latin font, they'll see the gibberish.

This is a bit wrong. I personally checked on my client with installed Cyrillic fonts There were still unreadable characters.

mtijanic commented 5 years ago

This is a bit wrong. I personally checked on my client with installed Cyrillic fonts There were still unreadable characters.

Unless your database encoding does something to the messages, these two things should return the exact same message:

// on player chat
#include "nwnx_sql"
void main()
{
    NWNX_SQL_ExecuteQuery("CREATE TABLE IF NOT EXISTS test(key int, msg varchar(128))");
    object oPC = GetPCChatSpeaker();
    string sMsg = GetPCChatMessage();
    SendMessageToPC(oPC, "Before database: "+sMsg);
    NWNX_SQL_ExecuteQuery("INSERT INTO test(key, msg) VALUES(123, '" + sMsg + "')");
    NWNX_SQL_ExecuteQuery("SELECT msg FROM test WHERE key=123");
    if (NWNX_SQL_ReadyToReadNextRow()) {
        NWNX_SQL_ReadNextRow();
        string sMsg2 = NWNX_SQL_ReadDataInActiveRow();
        SendMessageToPC(oPC, "Through database: " + sMsg2);
    }
}

Are you saying the two messages will be different? Can you send me the russian font, I want to try it.

mtijanic commented 5 years ago

Now, for the solution.

Any time you want to send data to/from a database, add a conversion layer to translate from nwn!Cyrillic into everyoneelse!Cyrillic. This can even be done in nwscript, but is better suited for nwnx.

The unfortunate thing here is that this translation will look different depending on which encoding you use for the database. If you store your data as UTF-8, it'll look like the reverse of what I posted above. But if you store as iso8859-5, it'll be different.

For NWNX, we could have a new 'translation' service that will do the mapping any time data needs to travel to/from database, and we can add various translation tables as needed.

MhlChesnokov commented 5 years ago

here is the code we use in nwscript.

// Script from test plate level (onUsed event) // sTag = GetTag(OBJECT_SELF); oPC = GetLastUsedBy(); // add text value to DB else if (sTag == "TS_TEST_LEVEL1") { string sEnter = GetLocalString(oPC, "sEnter"); // entered text from PC_Chat sSQL = "UPDATE mytest SET users='" + sEnter + "' WHERE id='1'"; NWNX_SQL_ExecuteQuery(sSQL); // debug FloatingTextStringOnCreature("\n [DEBUG] " + sSQL, oPC); //show query text FloatingTextStringOnCreature("[DEBUG] Введенный текст: " + sEnter, oPC); //show debug * "Entered text: " + sEnter } // Show text value from DB else if (sTag == "TS_TEST_LEVEL2") { string sResult; sSQL = "SELECT users FROM mytest WHERE id='1'"; NWNX_SQL_ExecuteQuery(sSQL); if (NWNX_SQL_ReadyToReadNextRow()) { NWNX_SQL_ReadNextRow(); sResult = NWNX_SQL_ReadDataInActiveRow(); } // debug FloatingTextStringOnCreature("\n [DEBUG] " + sSQL, oPC); //show query text FloatingTextStringOnCreature("[DEBUG] " + sResult, oPC); //show the resulting value }

and an example of what happens, the link to the picture. http://prntscr.com/mbftv0

and also settings of the table in SQL on the picture on the link http://prntscr.com/mbfxl6

mtijanic commented 5 years ago

My test:

Code used:

#include "nwnx_sql"
void main()
{
    object oPC = GetPCChatSpeaker();
    string sMsg = GetPCChatMessage();

    NWNX_SQL_ExecuteQuery("CREATE TABLE IF NOT EXISTS test_cyrillic (msgkey int not null primary key, msg varchar(128) not null) CHARACTER SET utf8 COLLATE utf8_unicode_ci;;");
    SendMessageToPC(oPC, "Before database: "+sMsg);
    string key = IntToString(Random(10000));
    NWNX_SQL_ExecuteQuery("INSERT INTO test_cyrillic(msgkey, msg) VALUES(" + key + ", '" + sMsg + "');");
    NWNX_SQL_ExecuteQuery("SELECT msg FROM test_cyrillic WHERE msgkey=" + key);
    if (NWNX_SQL_ReadyToReadNextRow()) {
        NWNX_SQL_ReadNextRow();
        string sMsg2 = NWNX_SQL_ReadDataInActiveRow();
        SendMessageToPC(oPC, "Through database: " + sMsg2);
    }

}

Your database collation is performing some conversion that's not reverted cleanly. Try using utf8.

Edit: but also:

mysql> select * from test_cyrillic;
+--------+-------------------------+
| msgkey | msg                     |
+--------+-------------------------+
|   2316 | ðóññêèé ÿçûê            |
|   9175 | test latin              |
+--------+-------------------------+
kelewind87 commented 5 years ago

This database data after experiments on it (like ALTER collation on whole DB) in the evening we will send the output with the necessary collation

kelewind87 commented 5 years ago

with the encoding of the normal base - what you indicated turned out, but there was another question - in the above way I can add the text to the base from game

But of course, if I do it in any other way - they are displayed

???

Is there a way at the moment of an external addition Cyrillic in base, so that it would appear normally in the client?

mtijanic commented 5 years ago

I'm not sure I understand. But if your question is "Is there a way to have cyrillic characters both in the database and in the game at the same time", the answer is not currently. That is what this issue is requesting, and that is something that should be implemented in NWNX.

MhlChesnokov commented 5 years ago

in the DB it looks like this: screenshot_2

and in game so: screenshot_3

in SQL, the text entered is not displayed correctly.

And if I send the text that is written in DB in Cyrillic - in the game I get ??? ?? ?? ???? ? .

mtijanic commented 5 years ago

Yeah, that's what I thought. And yes, this is something that NWNX should fix.

kelewind87 commented 5 years ago

Is there a way to have cyrillic characters both in the database and in the game at the same time

Yes, that is what I mean - we cannot fill out the database from any external editor with Cyrillic?

And yes, this is something that NWNX should fix.

IE to add transcoding methods to the plugin? Now this conversion is only on the client EE?

mtijanic commented 5 years ago

Yeah, we should add optional transcoding step whenever the client communicates with the database (or webhook or any other plugin).

kelewind87 commented 5 years ago

And you are able to orientate on the dates when such an addition is planned?

We just need to understand - will write own way of working with Cyrillic, or just wait for a while?

ELadner commented 5 years ago

Just a side note, but you're always better off using the bind variable functionality rather than building strings to submit to teh database.

With binds, you don't have to worry about somebody typing in text that will blow up your database or error out your SQL. And you don't have to have elaborate "escape SQL value" and "unescape SQL value" code.

There's an example in the documentation of the SQL plug. https://github.com/nwnxee/unified/blob/master/Plugins/SQL/Documentation/Migration_Guide.md (third example). The only thing that's missing here is you should call NWNX_SQL_DestroyPreparedQuery() after you're finished with a PreparedQuery (otherwise you could have side effects with un-cleared parameter lists).

On Thu, Jan 24, 2019 at 7:16 AM kelewind87 notifications@github.com wrote:

And you are able to orientate on the dates when such an addition is planned?

We just need to understand - will write own way of working with Cyrillic, or just wait for a while?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nwnxee/unified/issues/300#issuecomment-457192466, or mute the thread https://github.com/notifications/unsubscribe-auth/ADgtqmSv9IgRFCZazF7-89Mdxeq7oz9fks5vGbItgaJpZM4aPti3 .

-- Eric Ladner

mtijanic commented 5 years ago

And you are able to orientate on the dates when such an addition is planned?

I wouldn't be able to say. This is a volunteer community, so we don't assign issues to anyone. Someone might pick it up, but it might be this week or in 6 months, or who knows. I do think this is a good request, and should be implemented, but if it is waiting for me personally to do it it's likely sitting at a few months of backlog.

Of course, we'll gladly take the help if you guys want to share your code.

virusman commented 5 years ago

NWN historically used the system 8-bit encoding, so while for most users it was cp1252, for Russian input it was cp1251. The utf-8 conversion snippet is actually converting from UTF-8 to cp1251. Unless there is some conversion in NWNX, as long as the DB can properly handle cp1251 input and produce cp1251 output, it should work fine.

stenly87 commented 5 years ago
void MySQL::Connect(NWNXLib::ViewPtr<NWNXLib::Services::ConfigProxy> config)
{
    const std::string host = config->Get<std::string>("HOST", "localhost");
    const std::string username = config->Require<std::string>("USERNAME");
    const std::string password = config->Require<std::string>("PASSWORD");
    const NWNXLib::Maybe<std::string> database = config->Get<std::string>("DATABASE");
    if (database)
    {
        LOG_DEBUG("DB set to %s", (*database).c_str());
    }
    LOG_INFO("Connection info:  host=%s username=%s", host.c_str(), username.c_str());
    LOG_DEBUG("               :  password=%s", password.c_str());

    if (!mysql_real_connect(&m_mysql, host.c_str(), username.c_str(), password.c_str(), database ? (*database).c_str() : nullptr, 0, nullptr, 0))
    {
        throw std::runtime_error(std::string(mysql_error(&m_mysql)));
    }

    mysql_set_character_set(&m_mysql, "cp1251"); //***
}

*** This line enable russian text for me. I'm using db with cp1251 collation. Please add option to set character set.

mtijanic commented 5 years ago

I believe both issues were addressed in #371 - You can set charset to cp1251, or you can keep utf8 and translate russian to/from it. Please reopen if there's still issues with Cyrillic support.