msys2 / MINGW-packages

Package scripts for MinGW-w64 targets to build under MSYS2.
https://packages.msys2.org
BSD 3-Clause "New" or "Revised" License
2.27k stars 1.22k forks source link

Incorrect UTF-8 handling in localized dates formatted by glib-2.0 #19621

Open LRN opened 9 months ago

LRN commented 9 months ago

Description / Steps to reproduce the issue

Compile and run this program (constructs GDateTime, formats it as a string using %c, writes it to a file, all the while trying to use ru_RU locale):

#include <glib.h>
#include <locale.h>
#include <stdint.h>
#include <io.h>
#include <fcntl.h>

int main (int argc, char **argv)
{
  gint i;
  int fd;
  g_setenv ("LANG", "ru_RU", TRUE);
  setlocale (LC_ALL, "");
  GTimeZone *z = g_time_zone_new_identifier ("Z");
  GDateTime *now = g_date_time_new (z, 2024, 01, 04, 01, 02, 03);
  gchar *s = g_date_time_format (now, "\%c");
  fd = open ("./datetime.out", O_BINARY | O_RDWR);
  write (fd, s, strlen (s));
  write (fd, "\n", 1);
  for (i = 0; i < strlen (s); i++)
  {
    char *c = g_strdup_printf ("0x%02x ", ((uint8_t *)s)[i]);
    write (fd, c, strlen (c));
  }
  write (fd, "\n", 1);
  close (fd);
  return 0;
}

glib-2.0 package has to be installed, with localization (specifically, /usr/share/locale/ru/LC_MESSAGES/glib20.mo should be present).

Expected behavior

Чт, 4 Янв 2024, 01∶02∶03
0xd0 0xa7 0xd1 0x82 0x2c 0x20 0x34 0x20 0xd0 0xaf 0xd0 0xbd 0xd0 0xb2 0x20 0x32 0x30 0x32 0x34 0x2c 0x20 0x30 0x31 0xd0 0xb2 0xe2 0x82 0xac 0xc2 0xb6 0x30 0x32 0xd0 0xb2 0xe2 0x82 0xac 0xc2 0xb6 0x30 0x33

Actual behavior

Чт, 4 Янв 2024, 01∶02∶03
0xd0 0xa7 0xd1 0x82 0x2c 0x20 0x34 0x20 0xd0 0xaf 0xd0 0xbd 0xd0 0xb2 0x20 0x32 0x30 0x32 0x34 0x2c 0x20 0x30 0x31 0xd0 0xb2 0xe2 0x82 0xac 0xc2 0xb6 0x30 0x32 0xd0 0xb2 0xe2 0x82 0xac 0xc2 0xb6 0x30 0x33

Which is not what i expect (the separators between hours, minutes and seconds look completely wrong).

glib localization files use fancy unicode character "∶" instead of plain ":", and it seems to be correctly-formed UTF-8 in .po. But something goes wrong at runtime.

Verification

Windows Version

MINGW32_NT-10.0-22621

Are you willing to submit a PR?

No response

LRN commented 9 months ago

This could easily be a bug in libintl, but i have no way to check.

sskras commented 9 months ago

To make sample program work, I needed quite a bit of investigation. To get the direct results now I use this GNUmakefile:

all:
    gcc -g -o main.exe main.c $$(pkg-config --cflags --libs glib-2.0) -Wno-implicit-function-declaration
    @echo
    @> datetime.out
    @./main.exe
    @cat datetime.out

And IMO it runs just fine:

saukrs@DESKTOP-O7JE7JE MSYS ~/debug/POSIX-localized-dates
$ make
gcc -g -o main.exe main.c $(pkg-config --cflags --libs glib-2.0) -Wno-implicit-function-declaration

 4 Янв 2024 г.  1:02:03
0xe2 0x80 0x87 0x34 0x20 0xd0 0xaf 0xd0 0xbd 0xd0 0xb2 0x20 0x32 0x30 0x32 0x34 0x20 0xd0 0xb3 0x2e 0x20 0xe2 0x80 0x87 0x31 0x3a 0x30 0x32 0x3a 0x30 0x33

Maybe that's due to different native (NT-related) settings / configuration?

BTW I am running a bit older build of MSYS2:

saukrs@DESKTOP-O7JE7JE MSYS ~/debug/POSIX-localized-dates
$ uname -a
MSYS_NT-10.0-19044 DESKTOP-O7JE7JE 3.4.9.x86_64 2023-09-15 12:15 UTC x86_64 Msys
jeremyd2019 commented 9 months ago

The original report showed MINGW32 rather than MSYS, so to clarify, are we talking about the msys2 glib (which seemed to work as far as @sskras's attempt) or one of the MINGW ones? If the latter, a) this issue should probably be moved to MINGW-packages repository, and b) I think there are known issues with UTF-8 in the old msvcrt implementation used by MINGW*, you may have better luck with the newer UCRT (UCRT64 or CLANG*)

LRN commented 9 months ago

Oh, sorry. Forgot about the subsystem division. Yes, the bug should be filed against mingw32 subsystem. I've also tested it on ucrt64 subsystem, it is also affected.

sskras commented 9 months ago

I was mislead by name of the repo this issue was reported in: [msys2/MSYS2-packages]

Yes, it's present in MINGW64 too:

$ make
gcc -g -o main.exe main.c $(pkg-config --cflags --libs glib-2.0) -Wno-implicit-function-declaration

Чт, 4 Янв 2024, 01∶02∶03
0xd0 0xa7 0xd1 0x82 0x2c 0x20 0x34 0x20 0xd0 0xaf 0xd0 0xbd 0xd0 0xb2 0x20 0x32 0x30 0x32 0x34 0x2c 0x20 0x30 0x31 0xc3 0xa2 0xcb 0x86 0xc2 0xb6 0x30 0x32 0xc3 0xa2 0xcb 0x86 0xc2 0xb6 0x30 0x33

Strangely enough I am unable to build it on Cygwin:

$ make
gcc -g -o main.exe main.c $(pkg-config --cflags --libs glib-2.0) -Wno-implicit-function-declaration
main.c: In function ‘main’:
main.c:15:18: warning: initialization of ‘GTimeZone *’ {aka ‘struct _GTimeZone *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   15 |   GTimeZone *z = g_time_zone_new_identifier ("Z");
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: /tmp/cc9Ke1La.o: in function `main':
/cygdrive/c/msys64/home/saukrs/debug/POSIX-localized-dates/main.c:15:(.text+0x55): undefined reference to `g_time_zone_new_identifier'
collect2: error: ld returned 1 exit status
make: *** [GNUmakefile:2: all] Error 1

$ uname -smr
CYGWIN_NT-10.0-19044 3.4.9-1.x86_64 x86_64

Maybe because the library versions are too different. Cygwin:

$ uname -a
CYGWIN_NT-10.0-19044 DESKTOP-O7JE7JE 3.4.9-1.x86_64 2023-09-06 11:19 UTC x86_64 Cygwin

$ pkg-config --modversion glib-2.0
2.64.6

MINGW64:

$ uname -a
MINGW64_NT-10.0-19044 DESKTOP-O7JE7JE 3.4.9.x86_64 2023-09-15 12:15 UTC x86_64 Msys

$ pkg-config --modversion glib-2.0
2.78.0

Not sure. If it could be built, it would be interesting to compare the outputs.

Biswa96 commented 9 months ago

glib-2.0 package has to be installed, with localization (specifically, /usr/share/locale/ru/LC_MESSAGES/glib20.mo should be present).

Are you really using mingw glib2? Then, why is the requirement?

LRN commented 9 months ago

Are you really using mingw glib2? Then, why is the requirement?

I am really using mingw glib2. The requirement is because the string comes from the translation files, not from the code.

LRN commented 9 months ago

TBH, i would have figured it out myself, but packages in MSYS repos usually come completely devoid of any debug info whatsoever, which means that it's impossible to debug them conveniently. The only alternative is to build gettext & glib myself, which i currently don't have time for.