niXman / mingw-builds

Scripts for building the 32 and 64-bit MinGW-W64 compilers for Windows
Other
282 stars 107 forks source link

`ld` does not work with non-ASCII file path #649

Open CyanoHao opened 1 year ago

CyanoHao commented 1 year ago

These are upstream bugs (2 bugs as far as I know), and seem can be fixed with a patch for binutils.

1st bug: slash conversion in ld breaks wide filename.

big1-slash

This is cause by a mistake in variable name in FILE *_bfd_real_fopen(const char *filename, const char *modes) (binutils-2.39/bfd/bfdio.c).

   /* Convert any UNIX style path separators into the DOS i.e. backslash separator.  */
   for (ix = 0; ix < partPathLen; ix++)
     if (IS_UNIX_DIR_SEPARATOR(filename[ix]))
       partPath[ix] = '\\';

It should be

   for (ix = 0; ix < partPathLen; ix++)
     if (IS_UNIX_DIR_SEPARATOR(partPath[ix]))
       partPath[ix] = L'\\';  // prefix `L` is optional

2nd bug: ld gets wrong active code page with ___lc_codepage_func(). (MSVCRT only)

bug2-codepage

This is caused by a tricky problem in FILE *_bfd_real_fopen(const char *filename, const char *modes) (binutils-2.39/bfd/bfdio.c). In a word, MSVCRT version of UINT ___lc_codepage_func(void), which is invoked by ld to determine current code page, does not return system code page or active code page. Instead, it returns default code page for Windows display language. (UCRT version is okay.)

Changing ___lc_codepage_func() to CP_ACP seems to fix this bug. But I’m not sure whether there are some conner cases that would be broken.

Here is a simple program to test ___lc_codepage_func().

#include <iostream>
int main() {
  setlocale(LC_CTYPE, "");
  std::cout << ___lc_codepage_func() << std::endl;
}

With release of 13.1.0-rt_v11-rev1, x86-64 POSIX SEH:

  1. Windows display language: English (UK); system and active code page: 936 (Simplified Chinese). MSVCRT UCRT Expected
    1252 936 936
  2. Windows display language: English (UK); system code page: 936; active code page: 65001 (with application manifest). MSVCRT UCRT Expected
    1252 65001 65001
  3. Windows display language: English (UK); system and active code page: 65001 (check “Beta: Use Unicode UTF-8 for worldwide language support”). MSVCRT UCRT Expected
    1252 65001 65001
  4. Windows display language: Simplified Chinese; system and active code page: 65001. MSVCRT UCRT Expected
    936 65001 65001

(MSVCRT result for case 1 and 2:) codepage-msvcrt

(UCRT result for case 1 and 2:) codepage-ucrt

Patch

diff --unified --recursive --text binutils-2.39.orig/bfd/bfdio.c binutils-2.39/bfd/bfdio.c
--- binutils-2.39.orig/bfd/bfdio.c      2022-07-08 17:46:47.000000000 +0800
+++ binutils-2.39/bfd/bfdio.c   2023-06-24 19:56:02.752090800 +0800
@@ -122,7 +122,7 @@
    const wchar_t  prefix[] = L"\\\\?\\";
    const size_t   partPathLen = strlen (filename) + 1;
 #ifdef __MINGW32__
-   const unsigned int cp = ___lc_codepage_func();
+   const unsigned int cp = CP_ACP;
 #else
    const unsigned int cp = CP_UTF8;
 #endif
@@ -138,8 +138,8 @@

    /* Convert any UNIX style path separators into the DOS i.e. backslash separator.  */
    for (ix = 0; ix < partPathLen; ix++)
-     if (IS_UNIX_DIR_SEPARATOR(filename[ix]))
-       partPath[ix] = '\\';
+     if (IS_UNIX_DIR_SEPARATOR(partPath[ix]))
+       partPath[ix] = L'\\';

    /* Getting the full path from the provided partial path.
       1) Get the length.

By the way, if someone would like to fix it in upstream, a minor problem in this function can also be fixed:

   wchar_t *  fullPath = calloc (fullPathWSize + sizeof(prefix) + 1, sizeof(wchar_t));

A length of fullPathWSize + (sizeof(prefix) / sizeof(wchar_t) - 1) + 1 is sufficient.

niXman commented 12 months ago

@CyanoHao could you please provide the PR for the develop branch?

niXman commented 10 months ago

@CyanoHao do you want me to release a new build with this patch before closing this issue?

xuchengpeng commented 9 months ago

@CyanoHao do you want me to release a new build with this patch before closing this issue?

same problem, please release a new build, thanks

anbangli commented 9 months ago

@CyanoHao 请您协助解决下面的问题 (please help to solve the following bug)

If the installation directory of MinGW-w64 contains Chinese Characters, then compilation will fail. For example, MinGW-w64 is installed in directory "C:\编译器MinGW64" which contains Chinese characters, one tried to compile program "C:\myprogs\hello.cpp" with the following command:

C:\编译器MinGW64\bin\g++.exe "C:\myprogs\hello.cpp" -o "C:\编译器hello.exe" -Wall -Wextra -pipe -I"C:\编译器MinGW64\include" -I"C:\编译器MinGW64\x86_64-w64-mingw32\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\13.2.0\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\13.2.0\include\c++" -L"C:\编译器MinGW64\lib" -L"C:\编译器MinGW64\x86_64-w64-mingw32\lib" -static-libstdc++ -static-libgcc

The compilation will fail, and the output message is:

C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/crtbegin.o: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lstdc++: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc_eh: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc_eh: No space left on device C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/crtend.o: No space left on device collect2.exe: error: ld returned 1 exit status

It seems that the Chinese characters "编译器" are ignored in some internal stage.

anbangli commented 9 months ago

I also tested v12.2 and v11.2.

Compiling command with v12.2: C:\编译器MinGW64\bin\g++.exe "C:\myprogs\hello.cpp" -o "C:\编译器hello.exe" -Wextra -g3 -pipe -I"C:\编译器MinGW64\include" -I"C:\编译器MinGW64\x86_64-w64-mingw32\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\12.2.0\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\12.2.0\include\c++" -L"C:\编译器MinGW64\lib" -L"C:\编译器MinGW64\x86_64-w64-mingw32\lib" -g3

Output message: C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: No such file or directory C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/crtbegin.o: No such file or directory C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lstdc++: No such file or directory C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No such file or directory C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No such file or directory C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/crtend.o: No such file or directory collect2.exe: error: ld returned 1 exit status

It seems that the Chinese characters "编译器" are not ignored in message, but ignored in some internal stage.

Compiling command with v11.2: c:\编译器MinGW64>C:\编译器MinGW64\bin\g++.exe "C:\我的程序\测试hello.cpp" -o "C:\我的程序\测试hello.exe" -Wextra -g3 -pipe -I"C:\编译器MinGW64\include" -I"C:\ 编译器MinGW64\x86_64-w64-mingw32\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\11.2.0\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\11.2.0\include\c++" -L"C:\编译器MinGW64\lib" -L"C:\编译器MinGW64\x86_64-w64-mingw32\lib" -g3

Regardless of whether my source code has errors or not, v11.2 works OK.

CyanoHao commented 8 months ago

@anbangli The root causes in these 2 situations (either the path of user code, or the path of gcc itself, contains non ASCII characters) are same -- the object paths passed to ld (user object or crt2.o, etc) got broken.

It seems okay now (tested with x86_64-posix-seh-ucrt build). image