tklab-tud / uscxml

SCXML interpreter and transformer/compiler written in C/C++ with bindings to Java, C#, Python and Lua
Other
106 stars 54 forks source link

[EcmaScript JSC] Encoding UTF-8 test is failed with AccessViolation on Windows codepage 1251 #201

Open alexzhornyak opened 3 years ago

alexzhornyak commented 3 years ago

This issue is mostly duplicated with #144 but because of no rights to reopen I decided to post a new one to warn other users using non-latin codepages with JavaScriptCore datamodel

Steps to reproduce:

Conditions:

Windows any version, codepage 1251

Actions:

  1. Execute test-enc-UTF8.scxml

Results:

  1. We are getting Var14 from the data element
    char* tmp = XERCESC_NS::XMLString::transcode(toTranscode);
    _localForm = std::string(tmp);

    After conversion we have std::string Var14 ="''В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!''" It is ANSI 1251 string because method transcode transcodes a string to native code-page

Later we perform

JSStringRef scriptJS = JSStringCreateWithUTF8CString(expr.c_str());
JSValueRef exception = NULL;
JSValueRef result = JSEvaluateScript(_ctx, scriptJS, NULL, NULL, 0, &exception);

And we are getting hard exception here because we have different size. JSString expect UTF8 string which must be std::string Var14="'В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!'

Possible Solutions

  1. Change everything to unicode std::wstring
  2. Modify X(const XMLCh* const toTranscode) to force convert to UTF8 string
  3. Convert from native code-page in datamodels