Closed ghost closed 1 year ago
wprintf(L"%ls", out);
Did you try std::wcout << out;
instead?
#include "../../shared/sys.h"
#include <string>
int main(int argc, char const *argv[])
{
std::string test = "f";
wchar_t out;
BytesToCharacter(test, out);
// wprintf(L"%ls", out);
std::wcout << out;
return 0;
}
This piece of code no longer segmentation fault when using wcout
instead of wprintf
:
#include "../../shared/sys.h"
#include <string>
int main(int argc, char const *argv[])
{
std::string test = "f";
wchar_t out;
BytesToCharacter(test, out);
// wprintf(L"%ls", out);
std::wcout << out;
return 0;
}
$ ./test
f
This prints nothing so the result is the same as when using wprintf
, BytesToCharacter
doesn't work:
#include "../../shared/sys.h"
#include <string>
int main(int argc, char const *argv[])
{
std::string test = "foo";
wchar_t out;
BytesToCharacter(test, out);
// wprintf(L"%ls", out);
std::wcout << out;
return 0;
}
$ ./test
I am unaware of a Unicode character that maps to the bytes "foo." The input would typically be a multibyte array that maps to a Unicode character.
I am unaware of a Unicode character that maps to the bytes "foo." The input would typically be a multibyte array that maps to a Unicode character.
OK. If you want a multibyte array, here is the multibyte array of the Euro sign (€): \xe2\x82\xac
. Try replacing foo
with it. Still the same result.
Thank you, I will take a look, however, this is testing Win64 and POSIX APIs.
Here's a PoC that shows the working code.
The code was checked under VS 2022, MSYS2 URCT, and MSYS2 Clang (see Makefiles).
For MSYS2 URCT and VS 2022, the Unicode character will be printed to the console. For MSYS2 Clang, the out needs to be piped to a file, for example unicode.exe > out.txt
.
I did not run into issues with the conversions rather, time was spent configuring the console and writing the PoC.
With my own test, even on MSYS2 UCRT64, the Unicode character failed to print correctly on the console (it printed gibberish). Piping to a file worked. But that doesn't matter, your poc
proved that BytesToCharacter
will not work properly without manually setting _setmode
.
Update: changing _O_U16TEXT
to _O_U8TEXT
for both stdin
and stdout
on unicode.cpp
then it will be able to correctly print the character on the console on MSYS2 UCRT64.
BytesToCharacter
is an internal helper function of Objeck so _setmode
is always set anyway. So this is not really a problem at all. Just something to be noted.
Discussed in https://github.com/objeck/objeck-lang/discussions/333