Open mattjgalloway opened 2 years ago
I've done a bit more digging on this and seem to have come up with an even more minimal test case, this time not using std::future
/std::promise
at all. And this time it crashes when compiled with plain clang.exe
and also clang-cl.exe
. It still crashes when using a PCH and doesn't crash when not using a PCH. Hopefully this is a little easier to digest therefore.
thing.hpp:
#pragma once
#include <cstdlib>
struct Thing {
Thing() { mem_ = malloc(1); }
~Thing() { free(mem_); }
Thing& operator=(const Thing& other) {
if (this == &other) return *this;
mem_ = malloc(1);
return *this;
}
Thing(const Thing& other) {
mem_ = malloc(1);
}
void* mem_;
};
pch.hpp:
#include "thing.hpp"
void foo() {
try {
Thing t;
throw t;
} catch (const Thing& t) {}
}
crash.cpp:
#include <cstdio>
#include "thing.hpp"
int main(int argc, char* argv[]) {
printf("started\n");
try {
Thing t;
throw t;
} catch (Thing t) {}
printf("didn't crash\n");
return 0;
}
Then compiling with a PCH:
PS1> & 'C:\Program Files\LLVM13.0.0\bin\clang++.exe' -Xclang -emit-pch -o pch.gch pch.hpp
PS1> & 'C:\Program Files\LLVM13.0.0\bin\clang++.exe' -include-pch pch.gch -O0 -o crash.exe crash.cpp
PS1> .\crash.exe
started
Without a PCH:
PS1> & 'C:\Program Files\LLVM13.0.0\bin\clang++.exe' -O0 -o crash.exe crash.cpp
PS1> .\crash.exe
started
didn't crash
Again it’s crashing because the copy ctor of Thing
was not called when it needed to be. That results in a double free of the mem_
member of Thing
.
Note that it will also not crash, even with the PCH, if in crash.cpp
you catch the exception by reference. But that's because there's no exception copying going on. The catch by value is there to force the copy ctor to need to be called.
I took a look at the LLVM in the case of using a PCH and not using a PCH and I can see a difference in how the CatchableType
for Thing
is represented:
Not crashy, copy ctor called:
@"_CT??_R0?AUThing@@@8??0Thing@@QEAA@AEBU0@@Z8" = linkonce_odr unnamed_addr constant %eh.CatchableType { i32 0, i32 trunc (i64 sub nuw nsw (i64 ptrtoint (%rtti.TypeDescriptor11* @"??_R0?AUThing@@@8" to i64), i64 ptrtoint (i8* @__ImageBase to i64)) to i32), i32 0, i32 -1, i32 0, i32 8, i32 trunc (i64 sub nuw nsw (i64 ptrtoint (%struct.Thing* (%struct.Thing*, %struct.Thing*)* @"??0Thing@@QEAA@AEBU0@@Z" to i64), i64 ptrtoint (i8* @__ImageBase to i64)) to i32) }, section ".xdata", comdat
Crash, copy ctor not called:
@"_CT??_R0?AUThing@@@88" = linkonce_odr unnamed_addr constant %eh.CatchableType { i32 0, i32 trunc (i64 sub nuw nsw (i64 ptrtoint (%rtti.TypeDescriptor11* @"??_R0?AUThing@@@8" to i64), i64 ptrtoint (i8* @__ImageBase to i64)) to i32), i32 0, i32 -1, i32 0, i32 8, i32 0 }, section ".xdata", comdat
I cannot claim to understand LLVM IR enough at this point to understand how to debug further, but I think I can glean enough here to see that this looks wrong. In the crashy case (i.e. using a PCH) we see that the CatchableType
doesn't have the final field set. Whereas in the non-crashy case it's set to, well, the copy ctor of Thing
(demangled, it's public: __cdeclThing::Thing(struct Thing const & __ptr64) __ptr64
).
I hope this helps someone with more understanding to get to the bottom of the problem.
CC @rnk, in case this rings a bell.
cc @tentzen who's familiar with EH for windows in LLVM.
Looks like it's related to COMDAT Folding (/OPT:ICF). It seems a Clang's linker bug.
The xdata objects, one from precompiled header and another one from crash.obj, should not be merged: "CTA1?AUThing@@": // from pch.gch .long 1 # 0x1 .long "CT??_R0?AUThing@@@84"
"CTA1?AUThing@@": // // from crash.obj .long 1 # 0x1 .long "CT??_R0?AUThing@@@8??0Thing@@QAE@ABU0@@Z4"
Try use the option in clang++ (if there is one) corresponding to CL's -Gy- to get around the problem.
@tentzen Many thanks for the reply! I've tried with /Gy-
on the clang-cl
driver path, and unfortunately, that doesn't help mitigate the problem.
I can't claim to know anywhere near as much as you here, but I'm unsure it's a linker problem if I see the LLVM that I mention in a comment above? Seems to indicate to me that it's missing at that point. Could it be related to pulling over the copy ctor information from the PCH somehow?
I put this on our bug scrub list to get someone to take a look at this. (@zmodem)
We're seeing the same behavior with LLVM 16.0.6 (although for some reason only in Release builds). I also tried mattjgalloway's test and it crashes with both 16.0.6 and 17.0.2. Has there been any progress on this issue?
I've done some debugging, and I think I've figured out what's going on. It appears to be an issue with serializing and deserializing the AST, not with COMDAT folding. If this is the case then it might also affect C++ Module support as that uses some of the same machinery.
It all comes down to CodeGen
's MicrosoftCXXABI::getCatchableType()
not seeing T
's copy constructor when processing the PCH.
This call to ASTContext::getCopyConstructorForExceptionObject()
returns NULL
https://github.com/llvm/llvm-project/blob/970e7456e0346f078962fa88bdc1b86fd9828d6f/clang/lib/CodeGen/MicrosoftCXXABI.cpp#L4171-L4172
That function ends up just looking up the constructor in AST
's MicrosoftCXXABI::RecordToCopyCtor
map https://github.com/llvm/llvm-project/blob/970e7456e0346f078962fa88bdc1b86fd9828d6f/clang/lib/AST/MicrosoftCXXABI.cpp#L150-L152
RecordToCopyCtor
seems to be poulated correctly while initially parsing the C++ code in the -emit-pch
invocation, but not when loading the PCH in the -include-pch
invocation.
Interestingly using -fpch-codegen
doesn't actually help because internally it gets separated into two compiler invocations - one with -emit-pch
and another with -include-pch
.
The linker presumably just chooses the first CatchableTypeArray
symbol it sees, which hapens to be the one from the PCH (referencing the incorrectly generated CatchableType
symbol _CT??_R0?AUThing@@@88
).
I don't know who to tag or how best to proceed, so sorry for the spam: @smeenai @WenleiHe @tentzen @rnk @zmodem
Nice debugging!
What we're doing here looks like a sparse field pattern: The copy ctor is effectively a field of the record decl, but we store it separately in a side table instead of making it a real field of the record itself. That suggests we can fix the issue by storing this "copy constructor for exception object" in the serialized CXXRecordDecl, even if during deserialization we put the data back into this side table. That seems like a reasonably natural serialization scheme, rather than storing the entire side table as a new record. Thoughts?
Hi, is there any more info about this? I recently tested (with LLVM 18.1.7), and this issue does indeed affect C++ modules, effectively making them unusable on Windows.
[UPDATE: I found a more minimal example and I've put that as a separate comment below. Preserving the original here as it shows the direction we went in and may help diagnosis.]
We've come across a strange issue when using
clang-cl.exe
on Windows. It seems to have started somewhere between 10.0.0 and 11.0.0. And I can confirm I am seeing it with the latest, LLVM13.0.0, as well.Important to note is that the issue only happens when using precompiled headers too.
(I am using
clang
with the standard library from Visual Studio 2019 with Windows SDK version 10.0.18362.0 and MSVC version 14.28.29333.)First let me show the minimal test case I've come up with.
crash.cpp:
pch.hpp:
Then we compile, link and run with
clang++.exe
like this:However if we compile, link and run with
clang-cl.exe
then it crashes:If you omit the use of PCH but still use
clang-cl.exe
then it works again:When tracing through why the crash is happening, it seems to be coming from the fact that at runtime, the
CatchableType
for thefuture_error
exception has nocopyFunction
. That means that inside_CopyExceptionObject
in ucrt, we justmemcpy
the exception data, which is wrong because thefuture_error
needs the proper copy constructor to run. It seems that the compiler emittingnullptr
forcopyFunction
in theclang-cl.exe
+PCH case is wrong. That's my understanding at least thus far!