Open boris-kolpackov opened 4 years ago
mentioned in issue llvm/llvm-bugzilla-archive#45064
This bug is still present in Clang 13.0.0 with the same state as for 12.0.1 (see the previous comment):
The original reproducer nor longer triggers it.
The reduced reproducer still triggers it with -O2
and -O3
provided -fexternc-nounwind
is not passed.
We've also hit another similar issue that is likely a manifestation of the same underlying bug in Clang (i.e., it is triggered when an exception is thrown but only in -O2
/-O3
optimized code). We've tried to reduce it to something that we can attach but so far unsuccessfully (things tend to change/disappear with slight code changes). One thing that is different in this case is that if running under the Visual Studio debugger, the process gets interrupted with "Unhandled exception" that points to the throw statement that is actually handled. Also the stack trace at this point looks wrong (no main(), etc).
This still appears to be an issue in Clang 12.0.1 however, things have shifted around:
The original reproducer nor longer triggers it.
The reduced reproducer still triggers it with -O2
and -O3
provided -fexternc-nounwind
is not passed.
I haven't tried the test mentioned in comment 9 as "program B" but could do so if you think it would be helpful.
Hans said he'd take a look at this.
No, an exception is not thrown from an extern "C"
function. My guess is -fexternc-nounwind
makes a difference because it affects the shape of the generated code (because the compiler makes different assumptions/provisions when calling extern "C"
functions with/without this option).
Sorry, I was on leave over the summer, so I didn't see the 2020 June/July updates. I haven't had time to read over this again, but if the problem is related to fexternc-nounwind, doesn't that mean the program is invoking UB by throwing an exception from an extern "C"
function? In any case, I'll add this to our next bug triage round and do more thorough investigation later.
Still an issue in Clang 11.0.1 and all the observations regarding -fexternc-nounwind
and -O2
/-O3
still hold.
We've run into this issue in another, unrelated, program. I've done some experiments with results described below in hopes it will help to fix this.
Let's call the original code that triggers this A and this second program B. Let's also call the reduced version of A that is attached to this report (the latest version by Reid Kleckner) A'.
One thing that I was wondering is why nobody else seems to be running into this and one potential answer is that most (all) other projects used clang-cl rather than the vanilla driver like we do. So I tried clang-cl on A' and the result does not trigger the issue with /O2
. In the end, I've tracked the difference down to -fexternc-nounwind
that clang-cl passes. So if we pass this option to vanilla clang, then A' does not trigger the issue, neither with -O2
nor with -O3
.
This, however, does not help A: for the original code the only thing that seems to help is lowering the optimization level to -O1
.
B is in between: with -fexternc-nounwind
the issue is gone with -O2
but not with -O3
.
Any progress on this? I would have liked to help but unfortunately this is out of my depth. I would be happy to do any legwork (testing, etc) if needed.
little more reduced Reduced the program a bit.
Thanks for the reproducer, I did some investigation. I think it's not that a destructor is running twice when it should not, it's that somehow a stale ofd field is being read in the ofd destructor.
The main thing that seems to cause this is SROA. We end up with a PHI on the ofd cleanuppad:
ehcleanup17: ; preds = %ehcleanup, %"?fdopen_pipe@@YA?AUfdpipe@@XZ.exit"
%ofd.sroa.11.0 = phi i32 [ -1, %ehcleanup ], [ %2, %"?fdopen_pipe@@YA?AUfdpipe@@XZ.exit" ], !dbg !1282
%ofd.sroa.0.1 = phi i32 [ %ofd.sroa.0.0, %ehcleanup ], [ %1, %"?fdopen_pipe@@YA?AUfdpipe@@XZ.exit" ], !dbg !1282
%10 = cleanuppad within none [], !dbg !1332
Anything that makes the PHI go away, such as removing potentially throwing operations from run_pipe or the extra callbacks destructor, hides the problem.
I'm not sure we should hold up 10.0 for this bug, but I will keep looking into it.
It would be really helpful if this bug is at least looked at before the final 10.0.0 release. Because it is only triggered by throwing an exception, this could be a latent issue present in a lot more codebases than just ours. So I took the liberty to raise the importance to release blocker.
We have managed to reproduce the issue with a single (though not exactly trivial) source file.
Normally, the attached program creates a pipe, closes its ends and exits:
c:\tmp>clang++ -g test.cxx
c:\tmp>a.exe
pipe: 3 4
fdclose: 4
fdclose: 3
exit
However, when it is built optimized it tries to close the pipe input twice and crashes inside the _close()
function:
c:\tmp>clang++ -O3 -g test.cxx
c:\tmp>a.exe
pipe: 3 4
fdclose: 4
fdclose: 3
fdclose: 3
The last fdclose()
call is not expected to happen.
The stack trace is attached.
assigned to @zmodem
Extended Description
We have a program that crashes in a destructor called for a stack object that appears to have never been constructed. The destructor is called during stack unwinding caused by throwing (and catching) an exception.
This only happens with Clang targeting MSVC and only when the program is compiled with optimization (
-O3
) and linked statically (but still using the DLL msvcrt). This affects Clang 9.0.1 and 10.0.0-rc2.So far we were unable to create a small reproducer and I am not very hopeful that we will since the issue appears to be specific to the code setup/path that we have. However, I would be happy to provide instructions on how to reproduce the issue under a debugger in the program directly (it is open source). Below I am also including extra details in case they may help.
You can see the two complete build logs that only differ in the static linking bit here (this is using Clang 9.0.1):
https://stage.build2.org/?builds=build2&cf=windows_10-clang_9.0_llvm_msvc_16.4-*O3
In particular, the C++ compiler invocations are the same sans a few macros that setup DLL importing in the shared case.
Below is a stack trace. The debugger attaches the the following message to the
b.exe!butl::fdclose(int fd) Line 1482 C++
line:Unhandled exception at 0x00007FFCCC24D5B8 (ucrtbase.dll) in b.exe: An invalid parameter was passed to a function that considers invalid parameters fatal.
The crash happens when an exception is thrown at line 1250[1] trying to destroy the
fdpipe
object. The onlyfdpipe
object that could possibly be being destroyed here is constructed below at line 1520[2]. So it feels like something is wrong with the exception handing tables.[1] https://github.com/build2/build2/blob/master/libbuild2/test/script/runner.cxx#L1250 [2] https://github.com/build2/build2/blob/master/libbuild2/test/script/runner.cxx#L1520