`sys.h` - `BytesToCharacter` doesn't work

objeck / objeck-lang

Objeck is a modern object-oriented programming language with functional features tailored for machine learning. It emphasizes expression, simplicity, portability, and scalability. The programming environment consists of a compiler, virtual machine, REPL shell, and command line debugger with IDE plugins.

https://objeck.org

Other

157 stars 11 forks source link

`sys.h` - `BytesToCharacter` doesn't work #334

Closed ghost closed 1 year ago

ghost commented 1 year ago

Discussed in https://github.com/objeck/objeck-lang/discussions/333

^{Originally posted by **iahung2** September 4, 2023} ``` #include "../../shared/sys.h" #include int main(int argc, char const *argv[]) { std::string test = "foo"; wchar_t out; BytesToCharacter(test, out); wprintf(L"%ls", out); return 0; } ``` Put this `test.cpp` on `core/lib/experimental`. Compile with: `$ g++ test.cpp ../../vm/sys.a -o test` Result: ``` $ ./test (null) ``` Update: This piece of code gives segmentation fault: ``` #include "../../shared/sys.h" #include int main(int argc, char const *argv[]) { std::string test = "f"; wchar_t out; BytesToCharacter(test, out); wprintf(L"%ls", out); return 0; } ``` Result: ``` $ ./test Segmentation fault ```

objeck commented 1 year ago

wprintf(L"%ls", out);

Did you try std::wcout << out; instead?

#include "../../shared/sys.h"
#include <string>

int main(int argc, char const *argv[])
{
    std::string test = "f";
    wchar_t out;
    BytesToCharacter(test, out);
//  wprintf(L"%ls", out);
    std::wcout << out;
    return 0;
}

ghost commented 1 year ago

This piece of code no longer segmentation fault when using wcout instead of wprintf:

#include "../../shared/sys.h"
#include <string>

int main(int argc, char const *argv[])
{
    std::string test = "f";
    wchar_t out;
    BytesToCharacter(test, out);
//  wprintf(L"%ls", out);
    std::wcout << out;
    return 0;
}

$ ./test
f

This prints nothing so the result is the same as when using wprintf, BytesToCharacter doesn't work:

#include "../../shared/sys.h"
#include <string>

int main(int argc, char const *argv[])
{
    std::string test = "foo";
    wchar_t out;
    BytesToCharacter(test, out);
//  wprintf(L"%ls", out);
    std::wcout << out;
    return 0;
}

$ ./test

objeck commented 1 year ago

I am unaware of a Unicode character that maps to the bytes "foo." The input would typically be a multibyte array that maps to a Unicode character.

ghost commented 1 year ago

I am unaware of a Unicode character that maps to the bytes "foo." The input would typically be a multibyte array that maps to a Unicode character.

OK. If you want a multibyte array, here is the multibyte array of the Euro sign (€): \xe2\x82\xac. Try replacing foo with it. Still the same result.

objeck commented 1 year ago

Thank you, I will take a look, however, this is testing Win64 and POSIX APIs.

objeck commented 1 year ago

Here's a PoC that shows the working code.

The code was checked under VS 2022, MSYS2 URCT, and MSYS2 Clang (see Makefiles).

For MSYS2 URCT and VS 2022, the Unicode character will be printed to the console. For MSYS2 Clang, the out needs to be piped to a file, for example unicode.exe > out.txt.

I did not run into issues with the conversions rather, time was spent configuring the console and writing the PoC.

ghost commented 1 year ago

With my own test, even on MSYS2 UCRT64, the Unicode character failed to print correctly on the console (it printed gibberish). Piping to a file worked. But that doesn't matter, your poc proved that BytesToCharacter will not work properly without manually setting _setmode.

ghost commented 1 year ago

Update: changing _O_U16TEXT to _O_U8TEXT for both stdin and stdout on unicode.cpp then it will be able to correctly print the character on the console on MSYS2 UCRT64.

ghost commented 1 year ago

BytesToCharacter is an internal helper function of Objeck so _setmode is always set anyway. So this is not really a problem at all. Just something to be noted.