Open rvogg opened 3 years ago
is this attempting to throw across a module boundary? That can be an issue, but doesn't appear to be the case here as far as I can see from the backtrace.
hmm guess this is what is also hitting a gcc clang build if trying to link to static libgcc it will fail to build.
is this attempting to throw across a module boundary? That can be an issue, but doesn't appear to be the case here as far as I can see from the backtrace.
As I said, I tried to create a minimal example where I also threw exceptions over library boundaries. But I could not reproduce the error there. I suspect that ccache makes a system call that breaks the exception handling.
Unfortunately I don't understand enough about exception handling models like dwarf to be able to isolate the issue better.
Seems like this also affects #9088
I purged -static-libgcc
from ldflags and the build works without additional changes/patches
Without this I was consistently getting ICE on CI
guess we should report this upstream, thats a major breakage :O
this also affected the tdm build i was maintaining, and it seems to go back further than i thought. first time this cropped up problems with the tdm builds was gcc-8 and it slowly got worse with the newer gcc versions. at first it was only the 32 bit compiler which occasionally would bail on code that worked before, but later it would also fail with static exceptions on the 64 bit code. The funny thing about the tdm builds is that they make use of code to allow throwing exceptions across dll boundaries even when linked to the static exception runtimes, so this kinda sucked because before all these problems i could actually build a gcc version of clang that did not rely on the libgcc and libstdc++ dll's, this is now impossible unfortuantly.
Not sure if there is any correlation, but there was a similar problem in ccache with the MIPS toolchain when using the gold linker instead of the bfd linker: https://github.com/ccache/ccache/issues/907
wow thats quite a problem :S
I was running in the same or similar bug during the debugging of ccache built with 64 bit gcc.
A system test failed with an exception. So I tried to to debug it with gdb.
But it does not hit the expected exception, with attached debugger the process was died at the same point as the 32-bit version, however in contrast with an error message terminate called after throwing an instance of 'core::Error'
.
After that, I took some time to look at the bug more closely, and the behavior gets weirder as I looked on it.
The process died on the following assert, without any error message:
File: unwind-dw2.c
1578: static void __attribute__((noinline))
1579: uw_init_context_1 (struct _Unwind_Context *context,
1580: void *outer_cfa, void *outer_ra)
1581: {
1582: void *ra = __builtin_extract_return_addr (__builtin_return_address (0));
1583: _Unwind_FrameState fs;
1584: _Unwind_SpTmp sp_slot;
1585: _Unwind_Reason_Code code;
1586:
1587: memset (context, 0, sizeof (struct _Unwind_Context));
1588: context->ra = ra;
1589: if (!ASSUME_EXTENDED_UNWIND_CONTEXT)
1590: context->flags = EXTENDED_CONTEXT_BIT;
1591:
1592: code = uw_frame_state_for (context, &fs);
>> 1593: gcc_assert (code == _URC_NO_REASON);
1594:
The reason is located in the _Unwind_Find_FDE
function, the both pointer seen_objects
and unseen_objects
are a nullptr:
File: unwind-dw2-fde.c
1029: const fde *
1030: _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
1031: {
...
1051: /* Linear search through the classified objects, to find the one
1052: containing the pc. Note that pc_begin is sorted descending, and
1053: we expect objects to be non-overlapping. */
>> 1054: for (ob = seen_objects; ob; ob = ob->next)
...
1061: }
1062:
1063: /* Classify and search the objects we've not yet processed. */
>> 1064: while ((ob = unseen_objects))
1065: {
...
1078: if (f)
1079: goto fini;
1080: }
I tried using a memory breakpoint to see if these pointers are ever set, but couldn't see it.
I was searching for the reason for a fmt::v7::format_error
.
After I attatched the debugger to the 64-bit version, the error message was now a core::Error
instead of the fmt::v7::format_error
.
The position of the exit seems to be the normal place when no catch block was found:
File: eh_throw.cc
74: extern "C" void
75: __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo,
76: void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
77: {
78: PROBE2 (throw, obj, tinfo);
79:
80: __cxa_eh_globals *globals = __cxa_get_globals ();
81: globals->uncaughtExceptions += 1;
82: // Definitely a primary.
83: __cxa_refcounted_exception *header =
84: __cxa_init_primary_exception(obj, tinfo, dest);
85: header->referenceCount = 1;
86:
87: #ifdef __USING_SJLJ_EXCEPTIONS__
88: _Unwind_SjLj_RaiseException (&header->exc.unwindHeader);
89: #else
90: _Unwind_RaiseException (&header->exc.unwindHeader);
91: #endif
92:
93: // Some sort of unwinding error. Note that terminate is a handler.
94: __cxa_begin_catch (&header->exc.unwindHeader);
>> 95: std::terminate ();
96: }
Then I thought I'll debug the dynamic 32-bit version, but the gcc-libs havn't any symbols. So I built the gcc packages locally (first without making any changes to the PKGBUILD) and installed the gcc-libs package.
Now everything was broken!
Every process crashed at startup and it was impossible to debug any process (even with the 64-bit multiarch gdb).
I thought that the reason for the strange behavior was in the local build of the gcc libs, so I triggered a github action to build a new gcc package. Then I installed the new packages together with cmake and ninja in an empty environment:
pacman --root new_root -Sy
pacman --root new_root -U mingw-w64-i686-gcc* mingw-w64-i686-libgccjit*
pacman --root new_root -S mingw-w64-i686-cmake mingw-w64-i686-ninja
After that I started a cmd, added the new environment as the only entry in the PATH variable and built the ccache project.
With static linking it was the same behavior as befor but with dynamic linking it was terminating with the following error message: terminate called after throwing an instance of 'core::Error'
After several tries to debug the behavior, it seems to be impossible to reproduce it with an attached debugger. Therefore I have enabled the JIT degugger to debug it. As far as I understood the code there, it was the normal place when no catch block was found.
But for me it was also not possible to detect the reason for this behavior. Maybe i can post a callstack when I setup the JIT debugger again.
I'm not sure what is in the current gcc package, but it is not reproducible locally or on github.
It's now broken in the dynamic case too -> #9771
As you correctly predicted I guess
I tried rebuilding ccache without -DSTATIC_LINK=OFF
with the rebuild gcc, and it still seems to fail as originally described.
Ooh, it's pulling in libgcc_s_dw2-1.dll via libhiredis.dll, I figure that's probably messing things up. (it's also coming via libzstd.dll)
Minimal reproducer:
#include <stdio.h>
#include <zstd.h>
int main()
{
try
{
printf("About to throw\n");
#ifdef BREAK_EXCEPTIONS
printf("Calling zstd: %u\n", ZSTD_versionNumber());
#endif
throw 42;
printf("After throw (unreachable)\n");
}
catch (...)
{
printf("Caught\n");
return 1;
}
return 0;
}
$ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd
$ ./testexc
About to throw
Caught
$ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd -DBREAK_EXCEPTIONS
$ ./testexc
About to throw
Calling zstd: 10500
I chose zstd pretty arbitrarily, it could be any DLL that's dynamically linked to libgcc.
Interestingly, it works with either -static-libgcc
, -static-libstdc++
, or neither, but breaks with both
We have fixed some of unwinding issues, does this still reproduce?
$ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd $ ./testexc About to throw Caught $ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd -DBREAK_EXCEPTIONS $ ./testexc About to throw Calling zstd: 10500
I still get the same result with gcc 11.3.0
Here is an upstream comment on this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105507#c5
these built with my TDM based toolset, exceptions work again guess it all came down to the grep problem we had.
@rvogg @revelator any solution on that? This issue still seems to persist only in i686-w64-mingw32. With x86_64 mingw it works fine.
You can check this issue: #5763
it is back?!? ok thats weird :S, not sure it will be looked at by those here you are probably better of reporting the bug at the gcc main site. also dont use __declspec(dllimport) with mingw (mingw uses autoimport so is a bit different than msvc).
@revelator Thanks for your fast response. I just saw this page: https://www.msys2.org/docs/cpp/#known-issues After removing -static-libgcc it seems to work. However, why does this not effect the 64 bit compiler?
-static alone should do unless g++ ignores it, in which case the gcc folks definatly need to know. focus has been allmost entirely on the 64 bit compiler in recent years so some stuff that needed to be looked at in the 32 bit windows version may have been omitted (not sure how many of the gcc devs even work with 32 bit anymore :S).
-static-libgcc -static-libstdc++ is just if you have a project allready dependent on a gazillion dll's and dont also want them to rely on the gcc/g++ runtime dll's, for simple projects which dont rely on other libraries besides the windows import libraries or where you have static versions of every library it depends on -static alone will do :).
there is a catch though if for example you use gcc to call g++ then -static-libstdc++ is ignored as gcc does not understand this flag and does not pass it on to g++.
and as for why it still works for me, i use a special build of gcc based on the TDM mingw-w64 compiler with some fixes i been working on over the years. what makes the TDM based gcc different is that it allows throwing exceptions without the libgcc dll.
it's not perfect in that it croaks on the aforementioned problem when some package decides to use gcc to call g++ creating a package which still depends on both the libstdc++ dll and libgcc dll because libstdc++ itself is built with a dependency on the libgcc dll. but for the rest it works just fine.
it is back?!? ok thats weird :S, not sure it will be looked at by those here you are probably better of reporting the bug at the gcc main site. also dont use __declspec(dllimport) with mingw (mingw uses autoimport so is a bit different than msvc).
It never went away, it's apparently a known limitation of gcc's exception handling. I think because it's not using native SEH mechanisms, which it is on other architectures like x86_64.
Why not use dllimport? It saves a thunk (call to an indirect jmp instruction) in some cases. And in cases of variables, it may be necessary. In any event, it is the correct thing to do if the symbol is imported from a dll.
just an observation that it seems to break at times (also noted from several sources that the import section is defined as empty in case of mingw) i also had success building sources that would normally croak with mingw if i changed the abi imports to empty). it seems to be ancient history from all the way back when the original mingw was something but i cant remember the exact reason.
@revelator I am not sure, if libstd.dll and libgcc.dll are covered by the Runtime Library exception. If you distribute them too, I think the Library exception does not apply.
So that's why I link to shared libraries with -static-libgcc and -static-libstdc++
ah ok :) . yeah i dont remember if it is covered either tbh ?.
https://mingw-w64-public.narkive.com/Zbonasdx/problems-with-dllimport-dllexport
this was probably the reason back then but it might be fixed now.
EDIT: it had problems when autoimport was defined i found the old bug report again which explained it. if you disabled autoimport it worked just fine but i think this is still the default in which case newer sources might have opted to disable it if using __declspec(dllimport). though if the linker croaks it might still be a valid place to look.
it is back?!? ok thats weird :S, not sure it will be looked at by those here you are probably better of reporting the bug at the gcc main site. also dont use __declspec(dllimport) with mingw (mingw uses autoimport so is a bit different than msvc).
It never went away, it's apparently a known limitation of gcc's exception handling. I think because it's not using native SEH mechanisms, which it is on other architectures like x86_64.
I don't think it's caused by SEH vs dwarf-2. Clang (with libunwind) also uses dwarf-2 for 32-bit but is not affected with the sames issues AFAIK. So that would be entirely libgcc implementation fault.
Why not use dllimport? It saves a thunk (call to an indirect jmp instruction) in some cases. And in cases of variables, it may be necessary. In any event, it is the correct thing to do if the symbol is imported from a dll.
Yeah, dllimport is the way to go if build system handles it correctly.
However, why does this not effect the 64 bit compiler?
The implementation of exception handling in 32-bit and 64-bit compilers is vastly different for all compilers on Windows. Even between 32-bit GCC and 32-bit Clang with both using dwarf-2 it's entirely different code.
indeed :), one can think of the libgcc dll as the same as microsofts msvcrt dll (if windows actually allowed it you could copy it to c:/windows/syswow64 and it would work just like the msvcrt dll just with dwarf exceptions instead of the native seh based crt).
but here comes a problem... the native windows system dll's are SEH based so throwing dwarf exceptions across system dll's will most likely lead to crashes (bit like a tourist using google translate to explain something to a native some of the meaning gets lost) so gcc and the mingw abi uses a lot of magic to allow it to work regardless (well in most cases).
sadly we newer got SEH for the 32 bit compiler and sometimes it shows (magic failure), sjlj is slower but has better compatibility with SEH though it still fails at times to.
the 64 bit gcc uses SEH so does not suffer from all of this in fact exceptions work just fine even if you opt to not use the gcc dll's at all sigh...
indeed :), one can think of the libgcc dll as the same as microsofts msvcrt dll (if windows actually allowed it you could copy it to c:/windows/syswow64 and it would work just like the msvcrt dll just with dwarf exceptions instead of the native seh based crt).
No. libgcc dll is libgcc (unwinding and certain low-level helper functions called by gcc-generated code)... msvcrt is libc (much much bigger). These are different beasts entirely, even on Linux.
The way I see it, the problem with statically linked libgcc is similar to the main problem with statically-linked msvcrt with msvc: each module then has its own (potentially different) copy of the library, with its own globals. In that case, there is no single source of truth for unwinding information. With SEH, the OS (probably ntdll RtlUnwind functions) is that single source of truth. With libgcc dll, the single dll instance is. Sure, it could probably use GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, ...
to find the module an address on the call stack comes from, and consult it for DWARF tables, but that doesn't handle dynamic unwinding information (which would necessarily have to be passed to the unwinder via functions and held in globals somewhere). I imagine if libunwind can handle unwinding across module boundaries in win32/DWARF mode when statically linked, this is probably what it does, and it probably does not/cannot handle dynamic unwinding information properly in that case, either...
bits of the crt seem to be part of libgcc (low level stuff indeed) msvcrt is still needed for actuall abi stuff while the libgcc dll handles unwinding. not sure how clang handles those parts but since exceptions only ever really apply to c++ it might have a copy of the static unwinders in libc++ maybe ?.
huh seems it does...
4.libc++.dll __imp___Unwind_Backtrace
4.libc++.dll __Unwind_Backtrace
5.libc++.dll __imp___Unwind_DeleteException
5.libc++.dll __Unwind_DeleteException
6.libc++.dll __imp___Unwind_FindEnclosingFunction
6.libc++.dll __Unwind_FindEnclosingFunction
7.libc++.dll __imp___Unwind_Find_FDE
7.libc++.dll __Unwind_Find_FDE
8.libc++.dll __imp___Unwind_ForcedUnwind
8.libc++.dll __Unwind_ForcedUnwind
9.libc++.dll __imp___Unwind_GetCFA
9.libc++.dll __Unwind_GetCFA
10.libc++.dll __imp___Unwind_GetDataRelBase
10.libc++.dll __Unwind_GetDataRelBase
11.libc++.dll __imp___Unwind_GetGR
11.libc++.dll __Unwind_GetGR
12.libc++.dll __imp___Unwind_GetIP
12.libc++.dll __Unwind_GetIP
13.libc++.dll __imp___Unwind_GetIPInfo
13.libc++.dll __Unwind_GetIPInfo
14.libc++.dll __imp___Unwind_GetLanguageSpecificData
14.libc++.dll __Unwind_GetLanguageSpecificData
15.libc++.dll __imp___Unwind_GetRegionStart
15.libc++.dll __Unwind_GetRegionStart
16.libc++.dll __imp___Unwind_GetTextRelBase
16.libc++.dll __Unwind_GetTextRelBase
17.libc++.dll __imp___Unwind_RaiseException
17.libc++.dll __Unwind_RaiseException
18.libc++.dll __imp___Unwind_Resume
18.libc++.dll __Unwind_Resume
19.libc++.dll __imp___Unwind_Resume_or_Rethrow
19.libc++.dll __Unwind_Resume_or_Rethrow
20.libc++.dll __imp___Unwind_SetGR
20.libc++.dll __Unwind_SetGR
21.libc++.dll __imp___Unwind_SetIP
21.libc++.dll __Unwind_SetIP
looks like all the non __imp functions are from the static unwinders ???.
just checked the libstdc++ import library and they are indeed missing there but located in libgcc_s.a which is a dependency of the libstdc++ dll anyway. oh boy :S
heh this is weird clangs libunwind dll holds the same exports but the libc++ dll does not depend at all on the libunwind dll while libstdc++ depends on the libgcc dll, i wonder if those symbols are explicitly set in clang so as to not depend directly on the libunwind dll ?.
if so kinda a nice workaround :) and one that might benefit gcc as well.
https://mingw-w64-public.narkive.com/Zbonasdx/problems-with-dllimport-dllexport
this was probably the reason back then but it might be fixed now.
EDIT: it had problems when autoimport was defined i found the old bug report again which explained it. if you disabled autoimport it worked just fine but i think this is still the default in which case newer sources might have opted to disable it if using __declspec(dllimport). though if the linker croaks it might still be a valid place to look.
How is this dllimport related to this issue?
not sure if it is just something to note as sometimes it broke with __declspec(dllimport) if auto import was enabled (it is by default). but as said it might be fixed now i havent seen any projects break because of it lately.
heh this is weird clangs libunwind dll holds the same exports but the libc++ dll does not depend at all on the libunwind dll while libstdc++ depends on the libgcc dll, i wonder if those symbols are explicitly set in clang so as to not depend directly on the libunwind dll ?.
It's enabled here https://github.com/llvm/llvm-project/blob/df168427b314f057c739eaccb21f361d3628f03b/libcxxabi/CMakeLists.txt#L116-L121 with LIBCXXABI_ENABLE_STATIC_UNWINDER
that we set to 1.
There is no magic here, static libunwind is linked into libc++ as a whole archive or similar way.
if so kinda a nice workaround :) and one that might benefit gcc as well.
No, it would break unwinding across DLLs in 32-bit builds with GCC (if it ever gets fixed).
oh so we could actually use the libunwind dll if we disabled that flag or does it break anything ?. strangely everything seems to work just fine with the clang compiler and the only bugger i had was with graphene which seems to miscompile in anything but debug mode on 32 bit (discussed in another thread here).
oh so we could actually use the libunwind dll if we disabled that flag or does it break anything ?.
The last time I tested everything was ok, but with the static library we can spare the trouble of distributing another DLL with negligible binary size increase.
If you build ccache >= 4.2 for mingw32, it crashes at runtime. This bug started to occur when static linking was enabled for the gcc runtime libraries. (https://github.com/ccache/ccache/pull/732)
The error almost always occurs in the same place: In Util.cpp read_file rais an exception:
But the catch block in the calling function is never reached.
Here is the callstack of "ccache -s" :
I can't generate a minimal example for this bug. But, since the prebuilt 32 bit versions of ccache do not have this bug I assume that it is caused by the mingw32 environment and not by ccache.