msys2 / MINGW-packages

Package scripts for MinGW-w64 targets to build under MSYS2.
https://packages.msys2.org
BSD 3-Clause "New" or "Revised" License
2.27k stars 1.21k forks source link

i686 lld crashes linking, mostly on Github runners #9048

Closed jeremyd2019 closed 3 years ago

jeremyd2019 commented 3 years ago

I have been building packages for clang32/i686 for several months now, and have not seen this until I started trying to build them in a Github actions workflow/hosted runner. It seems to happen intermittently on different packages, but seems to be consistently happening when linking python, which uses LTO and PGO (this is the profile-instr-generate phase).

libc++abi: terminating with uncaught exception of type std::__1::system_error: thread constructor failed: Exec format error
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.  Program arguments: D:/a/_temp/msys/msys64/clang32/bin/ld.lld -m i386pe --shared -Bdynamic -e _DllMainCRTStartup@12 --enable-auto-image-base -o libpython3.8.dll D:/a/_temp/msys/msys64/clang32/i686-w64-mingw32/lib/dllcrt2.o D:/a/_temp/msys/msys64/clang32/i686-w64-mingw32/lib/crtbegin.o -LD:/a/_temp/msys/msys64/clang32/i686-w64-mingw32/lib -LD:/a/_temp/msys/msys64/clang32/lib -LD:/a/_temp/msys/msys64/clang32/i686-w64-mingw32/sys-root/mingw/lib -LD:/a/_temp/msys/msys64/clang32/lib/clang/11.0.0/lib/windows --enable-auto-image-base --dynamicbase --no-seh --dynamicbase --no-seh --out-implib=libpython3.8.dll.a Modules/getbuildinfo.o Parser/acceler.o Parser/grammar1.o Parser/listnode.o Parser/node.o Parser/parser.o Parser/token.o Parser/myreadline.o Parser/parsetok.o Parser/tokenizer.o Objects/abstract.o Objects/accu.o Objects/boolobject.o Objects/bytes_methods.o Objects/bytearrayobject.o Objects/bytesobject.o Objects/call.o Objects/capsule.o Objects/cellobject.o Objects/classobject.o Objects/codeobject.o Objects/c...
 #0 0x6df45608 HandleAbort (D:\a\_temp\msys\msys64\clang32\bin\libLLVM.dll+0x125608)
 msys2/CLANG-packages#1 0x754a5b22 (C:\Windows\System32\ucrtbase.dll+0xa5b22)
 msys2/CLANG-packages#2 0x6dd97a98 abort_message (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x77a98)
 msys2/MINGW-packages#9044 0x6dd7a616 __cxa_throw_bad_array_new_length (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x5a616)
 msys2/CLANG-packages#4 0x6ddabf20 (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x8bf20)
 msys2/CLANG-packages#5 0x6ddabfa8 (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x8bfa8)
 msys2/CLANG-packages#6 0x0250e1a4 
 msys2/CLANG-packages#7 0x0003fda4 
 msys2/CLANG-packages#8 0x6dd022ac _ZN9libunwind12UnwindCursorINS_17LocalAddressSpaceENS_13Registers_x86EE4stepEv (D:\a\_temp\msys\msys64\clang32\bin\libunwind.dll+0x22ac)
 msys2/CLANG-packages#9 0x6dd05db6 _Unwind_RaiseException (D:\a\_temp\msys\msys64\clang32\bin\libunwind.dll+0x5db6)
msys2/CLANG-packages#10 0x75443dfc (C:\Windows\System32\ucrtbase.dll+0x43dfc)
msys2/CLANG-packages#11 0x011cdcfc _ZN3lld4coff12LinkerDriver11enqueuePathEN4llvm9StringRefEbb (D:\a\_temp\msys\msys64\clang32\bin\ld.lld.exe+0xdcfc)
msys2/CLANG-packages#12 0x6dd782cb _ZNSt3__120__throw_system_errorEiPKc (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x582cb)
msys2/CLANG-packages#13 0x75642b66 (C:\Windows\System32\KERNELBASE.dll+0x112b66)
msys2/CLANG-packages#14 0x778ee697 (C:\Windows\SYSTEM32\ntdll.dll+0x3e697)
msys2/CLANG-packages#15 0x7564b627 (C:\Windows\System32\KERNELBASE.dll+0x11b627)
msys2/CLANG-packages#16 0x6dd7a315 _ZNSt3__121__libcpp_execute_onceEPPvPFvvE (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x5a315)
msys2/CLANG-packages#17 0x6dd7a49a _ZNSt3__116__libcpp_tls_getEl (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x5a49a)
msys2/CLANG-packages#18 0x6dd96b42 __cxa_get_globals (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x76b42)
msys2/CLANG-packages#19 0x6dd96e85 _ZSt11__terminatePFvvE (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x76e85)
msys2/CLANG-packages#20 0x6dd997d7 __cxa_throw (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x797d7)
msys2/CLANG-packages#21 0x6dd99762 __cxa_throw (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x79762)
msys2/CLANG-packages#22 0x6dd782ba _ZNSt3__120__throw_system_errorEiPKc (D:\a\_temp\msys\msys64\clang32\bin\libc++.dll+0x582ba)
msys2/CLANG-packages#23 0x011cdcfc _ZN3lld4coff12LinkerDriver11enqueuePathEN4llvm9StringRefEbb (D:\a\_temp\msys\msys64\clang32\bin\ld.lld.exe+0xdcfc)
msys2/CLANG-packages#24 0x011cd994 _ZN3lld4coff12LinkerDriver11enqueuePathEN4llvm9StringRefEbb (D:\a\_temp\msys\msys64\clang32\bin\ld.lld.exe+0xd994)
msys2/CLANG-packages#25 0x6df393d0 _ZN4llvm3sys2fs11getUniqueIDENS_5TwineERNS1_8UniqueIDE (D:\a\_temp\msys\msys64\clang32\bin\libLLVM.dll+0x1193d0)
msys2/MINGW-packages#9045 0x011c8025 _ZN3lld4coff12LinkerDriver4linkEN4llvm8ArrayRefIPKcEE (D:\a\_temp\msys\msys64\clang32\bin\ld.lld.exe+0x8025)
msys2/CLANG-packages#27 0x77908450 (C:\Windows\SYSTEM32\ntdll.dll+0x58450)
msys2/MINGW-packages#9046 0x77908450 (C:\Windows\SYSTEM32\ntdll.dll+0x58450)
msys2/MINGW-packages#9047 0x778ed925 (C:\Windows\SYSTEM32\ntdll.dll+0x3d925)
msys2/CLANG-packages#30 0x778ed49e (C:\Windows\SYSTEM32\ntdll.dll+0x3d49e)
msys2/CLANG-packages#31 0x778ed1e2 (C:\Windows\SYSTEM32\ntdll.dll+0x3d1e2)
msys2/CLANG-packages#32 0x7564081e (C:\Windows\System32\KERNELBASE.dll+0x11081e)
msys2/CLANG-packages#33 0x0002ed3a 
jeremyd2019 commented 3 years ago

I think the GHA windows-latest runner is cursed with respect to 32-bit address space. I build msys2/i686 packages mostly using them too, and python build always has fork issues, even though it works fine on my local Windows VM.

mati865 commented 3 years ago

We'd probably need libc++ and LLVM built with RelWithDebInfo and run it on the CI... Alternatively we can try if just released LLVM 12 fixes it.

jeremyd2019 commented 3 years ago

I will try building with RelWithDebInfo... GHA time is free 😁

mati865 commented 3 years ago

Note that you will have to build it twice to get libc++ with debug info first and link LLVM to that libc++ in second build. Also do not forget to use options=('!strip') (I think this is the right syntax).

jeremyd2019 commented 3 years ago

It looks from that backtrace that libc++ (and libunwind) are dynamically linked. Hopefully I can get by with just one build of 'clang' packages

jeremyd2019 commented 3 years ago

Umm, yeah...

[MSYS2 CI] mingw-w64-clang: Installing

  loading packages...
  resolving dependencies...
  looking for conflicting packages...

  Packages (6) mingw-w64-clang-i686-clang-11.0.0-1  mingw-w64-clang-i686-compiler-rt-11.0.0-1  mingw-w64-clang-i686-libc++-11.0.0-1  mingw-w64-clang-i686-libunwind-11.0.0-1  mingw-w64-clang-i686-lld-11.0.0-1  mingw-w64-clang-i686-llvm-11.0.0-1
  :: Proceed with installation? [Y/n] y

  Total Installed Size:  15707.14 MiB

  checking keyring...
  checking package integrity...
  loading package files...
  checking for file conflicts...
  :: Processing package changes...
  installing mingw-w64-clang-i686-compiler-rt...
  installing mingw-w64-clang-i686-libunwind...
  installing mingw-w64-clang-i686-libc++...
  installing mingw-w64-clang-i686-llvm...
  installing mingw-w64-clang-i686-lld...
  installing mingw-w64-clang-i686-clang...
  error: could not extract /clang32/bin/libclang-cpp.dll (Write failed)
  error: could not extract /clang32/bin/libclang.dll (Write failed)
  error: could not extract /clang32/bin/scan-build (Write failed)
  error: could not extract /clang32/include/clang-c/CXCompilationDatabase.h (Write failed)

and a lot more...

For comparison, from msys2-autobuild:

Name           Used (GB)     Free (GB) Provider      Root                                               CurrentLocation
----           ---------     --------- --------      ----                                               ---------------
A                                      FileSystem    A:\                                                               
C                 165.65         89.86 FileSystem    C:\                                                               
D                   3.01         10.99 FileSystem    D:\                                 …ys2-autobuild\msys2-autobuild
Temp              165.65         89.86 FileSystem    C:\Users\runneradmin\AppData\Local…        

the setup-msys2 action installs the msys2 install to the D:

mati865 commented 3 years ago

It looks from that backtrace that libc++ (and libunwind) are dynamically linked. Hopefully I can get by with just one build of 'clang' packages

Oh, right.

the setup-msys2 action installs the msys2 install to the D:

cc @eine @lazka

lazka commented 3 years ago

hm, right.

jeremyd2019 commented 3 years ago

My first thought was to use the option to use the pre-installed msys2, which is installed on the C:.

lazka commented 3 years ago

might be worth a try: you can tell the action to use the system install: https://github.com/msys2/setup-msys2#release

jeremyd2019 commented 3 years ago

That was able to install the packages, but the resultant toolchain seems to silently fail (even the x86_64 version): https://github.com/jeremyd2019/CLANG-packages/actions/runs/756442336

I could modify the job to upload what did build even if it later fails, but I don't know what I'd do with 15GB worth of packages if I had them 😱

jeremyd2019 commented 3 years ago

Artifacts are available on https://github.com/jeremyd2019/CLANG-packages/actions/runs/757686158 for all jobs

jeremyd2019 commented 3 years ago

It seems that this becomes less likely if options=('!makeflags') is set. I did not expect this to do anything for python, because the build is serialized on linking libpython3.8.dll anyway, but it does seem to have allowed python to link.

mati865 commented 3 years ago

I think it's related to LTO (at least in Python case) since ThinLTO does run in multiple concurrent threads. Assuming you have rather low core/threads count that would explain why 32-bit LLD works on your PC but crashes on CI and my PC.

That would be avoidable by adding -Wl,-Xlink=-threads:1 to LDFLAGS. That way we can still benefit from building the code with multiple cores while avoiding memory issue when linking.

jeremyd2019 commented 3 years ago

That option doesn't seem to have prevented this error on CI

mati865 commented 3 years ago

Bummer, without using multiple compilation threads (this is what !makeflags does) we won't be able to build many of the packages within CI time limit.

mstorsjo commented 3 years ago

FYI, lld on 32 bit architectures is kinda known to be a bit limited. lld works by memory mapping all input files instead of reading them with regular file APIs (which I guess is one aspect of its design, making it fast), so when the linked project grows a bit bigger, you do hit a wall with it.

mstorsjo commented 3 years ago

FYI, lld on 32 bit architectures is kinda known to be a bit limited. lld works by memory mapping all input files instead of reading them with regular file APIs (which I guess is one aspect of its design, making it fast), so when the linked project grows a bit bigger, you do hit a wall with it.

... and now I see that this was already pointed out in another thread I was reading up on - sorry for the extra noise.

lazka commented 3 years ago

Would something like editbin /largeaddressaware lld.exe help here? Afair that bumps the limit on virtual address space from 2GB to 3GB or so

mstorsjo commented 3 years ago

Would something like editbin /largeaddressaware lld.exe help here? Afair that bumps the limit on virtual address space from 2GB to 3GB or so

Hm, I guess it would postpone the issue a bit at least.

mati865 commented 3 years ago

Would something like editbin /largeaddressaware lld.exe help here? Afair that bumps the limit on virtual address space from 2GB to 3GB or so

I thought we already did that, turns out it was done only for GCC package: https://github.com/msys2/MINGW-packages/blob/745da77dae2f4413924ca06ee6588ee41c25c78a/mingw-w64-gcc/PKGBUILD#L198 Rust does that for every i686 build: https://github.com/rust-lang/rust/blob/e11a9fa52a3f372dadd6db3d3f2ed7dc2621dcc4/compiler/rustc_target/src/spec/i686_pc_windows_gnu.rs#L17 maybe we should do the same?

jeremyd2019 commented 3 years ago

I thought largeaddressaware was on by default for 32-bit also. Maybe it should be

jeremyd2019 commented 3 years ago

I guess that was only on cygwin/msys, not mingw.

jeremyd2019 commented 3 years ago

I'm doing a test build with a PKGBUILD modification like the GCC one, but I'm also kind of thinking either adding large address aware to makepkg_mingw.conf, or even patching clang/lld to set it by default (though I have no idea how/where to do so), and seeing if it breaks anything. Thoughts?

mstorsjo commented 3 years ago

I'm doing a test build with a PKGBUILD modification like the GCC one, but I'm also kind of thinking either adding large address aware to makepkg_mingw.conf, or even patching clang/lld to set it by default (though I have no idea how/where to do so)

https://github.com/llvm/llvm-project/blob/llvmorg-12.0.0/lld/COFF/Driver.cpp#L1834, change the third parameter (the default if nothing was specified) from config->is64() to true.

lazka commented 3 years ago

I'd prefer the mingw conf path. It makes our life easier while limiting the risk for users building their own stuff.

jeremyd2019 commented 3 years ago

vs just doing in in the clang PKGBUILD? Or vs changing the default in LLD?

jeremyd2019 commented 3 years ago

So it looks like large-address-aware is default in cygwin: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/config/i386/cygwin.h#L134 but is a configure argument to gcc whether it is default for mingw-w64: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/config/i386/mingw-w64.h#L85-L87

lazka commented 3 years ago

vs just doing in in the clang PKGBUILD? Or vs changing the default in LLD?

vs changing the default. my 2c, from what I see this has the potential to break things, and unlike with ASLR we don't get the benefit of MSVC having this enabled by default, so bugs might surface. And rare/hard to reproduce bugs on top of that.

jeremyd2019 commented 3 years ago

I'm just going to do the clang PKGBUILD for now. My plan is to run through bootstrapping on my fork with my hacks around this issue removed and see if this solves it. Then I can see about a PR to MINGW-packages

mati865 commented 3 years ago

I'd be comfortable with enabling it globally for 32-bit as I consider it less risky than ASLR.

jeremyd2019 commented 3 years ago

Unfortunately, got the same exception building openssl in "stage 2". I downloaded the "stage 1" binary and confirmed that it has large address aware flag on it. ☹️

jeremyd2019 commented 3 years ago

Building clang with large-address-aware is still a good thing to do, but doesn't seem to have solved this particular issue.

jeremyd2019 commented 3 years ago

I did a run through on my local VM rebuilding all the "bootstrap" packages with a large-address-aware clang, which succeeded without issue and without any makeflags hacks

jeremyd2019 commented 3 years ago

https://github.com/msys2/msys2-autobuild/runs/2442885999?check_suite_focus=true#step:11:46795

  LLVM ERROR: out of memory
  LLVM ERROR: out of memory
  clang++: error: linker command failed due to signal (use -v to see invocation)

Well that's a first for me... but expected with 32-bit I guess... Hopefully large-address-aware helps that

mati865 commented 3 years ago

I think this time it had enough memory to catch exception/unwind? The reason for the crash remains the same. FTR https://github.com/mstorsjo/llvm-mingw/issues/151

jeremyd2019 commented 3 years ago

Well, the clang32 party has apparently hit its limit: the following packages all failed on autobuild due to out-of-memory issues:

This was expected at some point, but interesting

jeremyd2019 commented 3 years ago
  • wxWidgets
  • wxWidgets3.1
  • xalan-c
  • assimp

I was able to build all of these but xalan-c locally. Case of the cursed github runner again?

jeremyd2019 commented 3 years ago

I was able to build all of these but xalan-c locally.

I was able to build xalan-c locally passing -j2 to ninja

jeremyd2019 commented 3 years ago

Looks like the last build of mingw-w64-aom failed due to this error too, even though it succeeded on the prior revision. https://github.com/msys2/msys2-autobuild/runs/2896182279?check_suite_focus=true

jeremyd2019 commented 3 years ago

I have a theory on the "thread constructor failed: Exec format error" unhandled exception. @mati865 I remember there was some oddity in COFF regarding threads and -threads=x applied only to LTO but not other things. I'm not entirely sure how it works on Windows but build systems generally setup jobserver that limits the threads. Maybe COFF backend uses only that for other things than LTO? @jeremyd2019 https://github.com/llvm/llvm-project/blob/llvmorg-12.0.0/lld/COFF/Driver.cpp#L155 couldn't hurt to hack that and see what happens i guess @jeremyd2019 it seems like it should be using something from llvm Parallel instead of std::async there @jeremyd2019 looks like std::async in libc++ could launch an unbounded number of threads https://github.com/llvm/llvm-project/blob/llvmorg-12.0.0/libcxx/include/future#L2156 no evidence of a thread pool there

This patch seems to fix linking pretty good sized projects (qt6-base, llvm) with 32-bit lld:

--- lld/COFF/Driver.cpp.orig    2021-06-30 14:07:21.236743400 -0700
+++ lld/COFF/Driver.cpp 2021-06-30 14:07:57.752513500 -0700
@@ -149,9 +149,10 @@
 // Create a std::future that opens and maps a file using the best strategy for
 // the host platform.
 static std::future<MBErrPair> createFutureForFile(std::string path) {
-#if _WIN32
+#if _WIN64
   // On Windows, file I/O is relatively slow so it is best to do this
-  // asynchronously.
+  // asynchronously.  But 32-bit has issues with potentially launching tons
+  // of threads
   auto strategy = std::launch::async;
 #else
   auto strategy = std::launch::deferred;

So far it seems to be working well. I think a real solution would probably be to replace that with a proper thread pool though, that respects the -Wl,-Xlink=-threads: option.

jeremyd2019 commented 3 years ago

My patch for this is now committed upstream: https://github.com/llvm/llvm-project/commit/7a7da69fbe288de088bfee47d2f7c21da2132085