Open philippuk opened 2 months ago
Tagging author for visibility @minglotus-6
thanks for the report! I'll try to construct a small test case with strong and weak vtables.
Off the top of my head, here are two (complimentary) options to fix the issue. I need to implement them and test to figure out any unknown problems.
InstrProfiling.cpp
. Instead of making weak VTableProfileInfo
linker retained, add it in the same comdat group of the weak vtable so that both will be discarded or retained as a whole. The caveat is that not all platforms support comdat.llvm-profdata.cpp
, tolerate overlapping address ranges such that llvm-profdata
can generate indexed profiles without hitting assertion. Our investigation concludes that when if there is a strong vtable and weak vtable at the same time, the respective VTableProfileInfo still exist in the __llvm_prf_vtab section since it is marked linker retained and cannot be garbage collected. Therefore, the duplicate info is read by the profiler and lead to the assert hit with the overlapping vtable pointer.
Thanks for the investigation! Does the assertion come from this Symtab.mapVTableAddress call when converting raw profile to indexed profile? This call maps (possibly ASLR'ed) runtime address range of a vtable to its name hash. One thing that I don't have a immediate answer is how does the strong vtable and the weak vtable have overlapping address at runtime. Do you have some thoughts on this?
Small example I have used (Not sure if it is valid or not but it did hit the same assert on my x86 linux machine. Compiling them individually into object file with the flag enabled, link and run the program. Converting the generated raw profdata should lead to the problem)
weak_vtable.cpp
class A
{
virtual int func2() {return 2;}
};
int main()
{
A obj;
obj.func2();
}
strong_vtable.cpp
class A
{
virtual int func1 ();
};
int A::func1()
{
return 1;
}
The problem class in chromium is blink::WebView, maybe you can discover something from the code. https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/public/web/web_view.h;l=87?q=blink::WebView&sq=&ss=chromium https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/extensions/webview/web_view.h;l=20?q=blink::WebView&sq=&ss=chromium
Options 2 sounds good to me as there is another case that the some vtable only contains null pointers and the linker seems to have optimized them so that their VTableStartAddress in VTableProfileInfo are the same, pointing to the same vtable. (weirdly the problem goes away with my hack)
Does the assertion come from this Symtab.mapVTableAddress call when converting raw profile to indexed profile? This call maps (possibly ASLR'ed) runtime address range of a vtable to its name hash.
Yes, it comes from Symtab.mapVTableAddress. We have tried to turn off ASLR on android but the problem still exists.
One thing that I don't have a immediate answer is how does the strong vtable and the weak vtable have overlapping address at runtime. Do you have some thoughts on this?
I did not fully understand how this feature works as I am quite new to LLVM. Our observation is that the same VTableStartAddress is somehow linked to same VPtr recorded. A random guess would be duplicate counter recording the same address when the vtable is called.
Hope this helps!
Small example I have used (Not sure if it is valid or not but it did hit the same assert on my x86 linux machine. Compiling them individually into object file with the flag enabled, link and run the program. Converting the generated raw profdata should lead to the problem)
What are the clang compile commands to compile them individually and link them? I couldn't easily get them compiled (with friend int main()
to make private func2
callable to main function). Also do you use lld linker (e.g., --fuse-ld=lld
) or other linkers?
The following program [1] has ODR violation.
The compile/link/run commands are
./bin/clang++ -O2 program.cc bar.cc foo2.cc -fuse-ld=lld -o program
./program
which gives
Foo1::Foo1 # Foo1 ctor called
Foo2::Baz
~Foo2 #Foo2 dtor called
Foo1::Foo1 #Foo1 ctor called
Foo1::Bar
~Foo2 #Foo2 dtor called
[1]
// program.cc
#include "foo1.h"
#include "bar.h"
int main() {
Bar bar;
Foo foo;
foo.Bar();
}
// foo1.h
#include <iostream>
struct Foo {
Foo() { std::cout << "Foo1::Foo1" << std::endl; }
~Foo() { std::cout << "~Foo1" << std::endl; }
void Bar() { std::cout << "Foo1::Bar" << std::endl; }
};
// bar.h
struct Bar {
Bar();
};
// bar.cc
#include "bar.h"
#include "foo2.h"
Bar::Bar() {
Foo foo;
foo.Baz();
}
// foo2.h
#include <iostream>
struct Foo {
Foo() { std::cout << "Foo2::Foo2" << std::endl; }
~Foo();
void Baz() { std::cout << "Foo2::Baz" << std::endl; }
};
// foo2.cc
#include "foo2.h"
Foo::~Foo() { std::cout << "~Foo2" << std::endl; }
weak_vtable.cpp
class A { virtual int func2() {return 2;} }; int main() { A obj; obj.func2(); }
strong_vtable.cpp
class A { virtual int func1 (); }; int A::func1() { return 1; }
The problem class in chromium is blink::WebView, maybe you can discover something from the code. https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/public/web/web_view.h;l=87?q=blink::WebView&sq=&ss=chromium https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/extensions/webview/web_view.h;l=20?q=blink::WebView&sq=&ss=chromium
Options 2 sounds good to me as there is another case that the some vtable only contains null pointers and the linker seems to have optimized them so that their VTableStartAddress in VTableProfileInfo are the same, pointing to the same vtable. (weirdly the problem goes away with my hack)
Does the assertion come from this Symtab.mapVTableAddress call when converting raw profile to indexed profile? This call maps (possibly ASLR'ed) runtime address range of a vtable to its name hash.
Yes, it comes from Symtab.mapVTableAddress. We have tried to turn off ASLR on android but the problem still exists.
Ack. Just to clarify, enabling ASLR on the instrumented program is a well supported use case. Say one instrumented program is invoked multiple times and each time ASLR is enabled, people can convert individual raw profiles into its indexed format first, and then merge the indexed profiles together.
One thing that I don't have a immediate answer is how does the strong vtable and the weak vtable have overlapping address at runtime. Do you have some thoughts on this?
I did not fully understand how this feature works as I am quite new to LLVM. Our observation is that the same VTableStartAddress is somehow linked to same VPtr recorded. A random guess would be duplicate counter recording the same address when the vtable is called.
Hope this helps!
Thanks for following up! I collected raw profiles from a statically compiled real program and convert raw profiles into indexed format using an assertion-enabled llvm-profdata
with the following diff. The conversion doesn't hit any assertion. Wonder if the overlapping address ranges of vtables are due to dynamic linking (not to say llvm-profdata
shouldn't be updated to tolerate the overlapping address ranges).
diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h
index 824dcf2372c8..c82f511efa6e 100644
--- a/llvm/include/llvm/ProfileData/InstrProf.h
+++ b/llvm/include/llvm/ProfileData/InstrProf.h
@@ -615,6 +615,12 @@ public:
/// to its names' MD5 hash. This interface is only used by the raw profile
/// reader.
void mapVTableAddress(uint64_t StartAddr, uint64_t EndAddr, uint64_t MD5Val) {
+ auto LookupVal = VTableAddrMap.lookup(StartAddr, 0);
+ if (LookupVal != 0 && LookupVal != MD5Val) {
+ errs() << "Map " << StartAddr << ", " << EndAddr << "to both "
+ << LookupVal << " and " << MD5Val << "\n";
+ }
+ assert((LookupVal == 0 || LookupVal == MD5Val) && "my assert");
VTableAddrMap.insert(StartAddr, EndAddr, MD5Val);
}
Sorry for the late reply, I was trying to make sure the example "works" and I was compiling without the -O2
flag before. (Looking at the IR pipeline in compiler explorer, the Global DCE Pass seems to remove the vtable for simpler cases)
clang version (I am using clang from chromium)
$ third_party/llvm-build/Release+Asserts/bin/clang++ --version
clang version 19.0.0git (https://chromium.googlesource.com/external/github.com/llvm/llvm-project 1518b260ce2cbd9286365709642dc749e542d683)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Build config: +assertions
Just some small changes to the code I provided before could lead to the problem even with -O2
flag
strong_symbol.cpp
#include <iostream>
class Foo {
virtual void foo();
};
void Foo::foo()
{
std::cout << "~Foo2" << std::endl;
}
weak_symbol.cpp
#include <iostream>
class Foo {
public:
virtual void Bar() {std::cout << "~Foo1" << std::endl; };
};
int main() {
Foo foo;
foo.Bar();
}
Commands
$ third_party/llvm-build/Release+Asserts/bin/clang++ -fprofile-generate weak_symbol.cpp strong_symbol.cpp -o symbol -fuse-ld=lld -mllvm --enable-vtable-value-profiling -O2
$ ./symbol
~Foo1
Objdump the executable would lead to repeated vtable name hashes seen in the section
Disassembly of section __llvm_prf_vtab:
000000000000c7c0 <__start___llvm_prf_vtab>:
c7c0: b4 aa movb $-0x56, %ah
c7c2: ba 82 19 c6 c0 movl $0xc0c61982, %edx # imm = 0xC0C61982
c7c7: e5 00 inl $0x0, %eax
c7c9: 00 00 addb %al, (%rax)
c7cb: 00 00 addb %al, (%rax)
c7cd: 00 00 addb %al, (%rax)
c7cf: 00 18 addb %bl, (%rax)
c7d1: 00 00 addb %al, (%rax)
c7d3: 00 00 addb %al, (%rax)
c7d5: 00 00 addb %al, (%rax)
c7d7: 00 b4 aa ba 82 19 c6 addb %dh, -0x39e67d46(%rdx,%rbp,4)
000000000000c7d8 <__profvt__ZTV3Foo>:
c7d8: b4 aa movb $-0x56, %ah
c7da: ba 82 19 c6 c0 movl $0xc0c61982, %edx # imm = 0xC0C61982
c7df: e5 00 inl $0x0, %eax
c7e1: 00 00 addb %al, (%rax)
c7e3: 00 00 addb %al, (%rax)
c7e5: 00 00 addb %al, (%rax)
c7e7: 00 18 addb %bl, (%rax)
c7e9: 00 00 addb %al, (%rax)
c7eb: 00 00 addb %al, (%rax)
c7ed: 00 00 addb %al, (%rax)
c7ef: 00 <unknown>
And running the llvm-profdata with the generated raw file would lead to the error
$ third_party/llvm-build/Release+Asserts/bin/llvm-profdata merge default_13551080493226322571_0.profraw -o test.profdata --enable-vtable-value-profiling
llvm-profdata: /third_party/llvm/llvm/include/llvm/ADT/IntervalMap.h:638: unsigned int llvm::IntervalMapImpl::LeafNode<unsigned long, unsigned long, 4, llvm::IntervalMapHalfOpenInfo<unsigned long>>::insertFrom(unsigned int &, unsigned int, KeyT, KeyT, ValT) [KeyT = unsigned long, ValT = unsigned long, N = 4, Traits = llvm::IntervalMapHalfOpenInfo<unsigned long>]: Assertion `(i == Size || Traits::stopLess(b, start(i))) && "Overlapping insert"' failed.
PLEASE submit a bug report to https://crbug.com in the Tools>LLVM component, run tools/clang/scripts/process_crashreports.py (only if inside Google) to upload crash related files, and include the crash backtrace.
Stack dump:
0. Program arguments: third_party/llvm-build/Release+Asserts/bin/llvm-profdata merge default_13551080493226322571_0.profraw -o test.profdata --enable-vtable-value-profiling
#0 0x000057329829d268 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (third_party/llvm-build/Release+Asserts/bin/llvm-profdata+0x51a268)
Aborted (core dumped)
To be honest, I think it may be a linker bug but I am not sure. Hope this helps!
One more thing to add
If not using the lld as the linker. the objdump of the section becomes
Disassembly of section __llvm_prf_vtab:
000000000000c3a8 <__start___llvm_prf_vtab>:
c3a8: b4 aa movb $-0x56, %ah
c3aa: ba 82 19 c6 c0 movl $0xc0c61982, %edx # imm = 0xC0C61982
c3af: e5 60 inl $0x60, %eax
c3b1: bd 00 00 00 00 movl $0x0, %ebp
c3b6: 00 00 addb %al, (%rax)
c3b8: 18 00 sbbb %al, (%rax)
c3ba: 00 00 addb %al, (%rax)
c3bc: 00 00 addb %al, (%rax)
c3be: 00 00 addb %al, (%rax)
000000000000c3c0 <__profvt__ZTV3Foo>:
c3c0: b4 aa movb $-0x56, %ah
c3c2: ba 82 19 c6 c0 movl $0xc0c61982, %edx # imm = 0xC0C61982
c3c7: e5 60 inl $0x60, %eax
c3c9: bd 00 00 00 00 movl $0x0, %ebp
c3ce: 00 00 addb %al, (%rax)
c3d0: 18 00 sbbb %al, (%rax)
c3d2: 00 00 addb %al, (%rax)
c3d4: 00 00 addb %al, (%rax)
c3d6: 00 00 addb %al, (%rax)
where we can see the same vtable pointer in the section
Thanks for the example in https://github.com/llvm/llvm-project/issues/104813#issuecomment-2312352357 and commands in https://github.com/llvm/llvm-project/issues/104813#issuecomment-2314930897
class Foo
is defined differently in two cpp files. As I understand it, this is an ODR violation. And if a program has ODR violation, the compiler and linker are not required to diagnose this violation, but the behavior of the program that violates it is undefined.
Thanks for the response, it seems like ODR violation in this case then.
It may deserve another thread but there is another problem separated from the ODR problem.
Options 2 sounds good to me as there is another case that the some vtable only contains null pointers and the linker seems to have optimized them so that their VTableStartAddress in VTableProfileInfo are the same, pointing to the same vtable. (weirdly the problem goes away with my hack)
To give more detail, the overlapping assert hit when enabling -fexperimental-relative-c++-abi-vtables
and passing O2
to the lld
linker (Enable string merging). Not enabling either of the optimization would resolve the problem.
I do not think I could provide an example for this. But do you have any idea why this may happen? Or is this feature not meant to be used together with -fexperimental-relative-c++-abi-vtables
?
The bug is found when trying to collect the PGO profile for chromium with the flag -enable-vtable-value-profiling.
llvm-profdata: chromium/src/third_party/llvm/llvm/include/llvm/ADT/IntervalMap.h:638: unsigned int llvm::IntervalMapImpl::LeafNode<unsigned long, unsigned long, 8, llvm::IntervalMapHalfOpenInfo<unsigned long>>::insertFrom(unsigned int &, unsigned int, KeyT, KeyT, ValT) [KeyT = unsigned long, ValT = unsigned long, N = 8, Traits = llvm::IntervalMapHalfOpenInfo<unsigned long>]: Assertion (i == Size || Traits::stopLess(b, start(i))) && "Overlapping insert"' failed.
Our investigation concludes that when if there is a strong vtable and weak vtable at the same time, the respective VTableProfileInfo still exist in the ___llvm_prfvtab section since it is marked linker retained and cannot be garbage collected. Therefore, the duplicate info is read by the profiler and lead to the assert hit with the overlapping vtable pointer.
Our temporary hack is to add an if statement to not marked it as linker retained if the global variable is _linkonceodr in https://github.com/llvm/llvm-project/blob/0ee0857363aadf9ce0f403e7e0da10f0a9d94887/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp#L1594-L1596