Closed tkuowork closed 1 year ago
If you see the heap dump the diamond must be there. Are you really sure that your browser window doesn't extend beyond your viewport, making the right hand side hidden?
From the screenshot looks like it's what's happening here. I have no other explanation
Ah no actually I can repro myself.
@stevegolton can you take a look? This might be some silly UI bug when computing the trace bounds or similar. If the snapshot loads in the bottom panel, the diamond (the profile) must be there (in the tables).
thanks for the quick response. glad you were able to repro.
looking forward to the fix!
Hey @tkuowork, thanks for raising the issue and for providing so much detail!
I noticed that the timestamps in your attached trace are quite large ~185 days, and all heap profiles have the same timestamp, resulting in an effective trace "duration" of 1ns.
The UI has a limitation where timestamps start to lose precision after ~100 days, which causes issues when the trace is so short as the start and end times cannot be resolved, which breaks timeline rendering.
We are actually currently working on a solution to this problem, but it's complex and might take some time. As a temporary fix, you could try restarting your machine you recorded the traces on and re-recording them.
@stevegolton thanks! the ~185 day timestamp makes sense because thats how long my machine has been up.
I restarted my machine and now i do see a diamond!
here is the heap-trace for that view zipped:
thread-heapprofd-trace.zip
however, I am curious as to why I only see one diamond. I would have expected to see multiple diamonds since I am using continuous dumps.
here is the source code for the heap trace above and where I see one diamond:
#include <iostream>
#include <chrono>
#include <thread>
using namespace std;
class MyClass1 {
public:
MyClass1() {
}
void allocate() {
cout << "Allocating data in MyClass1" << endl;
data = new char[8096];
}
private:
char* data;
};
int main() {
MyClass1* obj1 = new MyClass1();
thread* t1 = new thread([&obj1](){
for (int i = 0; i < 10; i++) {
this_thread::sleep_for(chrono::seconds(1));
obj1->allocate();
}
});
t1->join();
return 0;
}
here are the commands I ran:
// start perfetto
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ out/linux/traced
[154.346] service.cc:240 Started traced, listening on /tmp/perfetto-producer /tmp/perfetto-consumer
// start heap trace with continuous dumps
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ PERFETTO_SYMBOLIZER_MODE=index tools/heap_profile -n test -c 5 --print-config | out/linux_clang_release/perfetto -c - --txt -o ~/heapprofd-trace
[609.717] perfetto_cmd.cc:1026 Connected to the Perfetto traced service, starting tracing
// build code
tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ g++ -g nonthreadLeaky.cpp -o test
// run code
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ LD_PRELOAD=out/linux_clang_release/libheapprofd_glibc_preload.so ../test/test
when I run the same code OUTSIDE a pthread
or cpp::thread
I get this really nice graph with a timeline, which is what i am hoping to see when my application is threaded (or multi-threaded)
I am able to see memory being allocated every second just like what the code is doing !
here is the corresponding trace zipped : heapprofd-trace.zip
here is the source code
#include <iostream>
#include <chrono>
#include <thread>
using namespace std;
class MyClass1 {
public:
MyClass1() {
}
void allocate() {
cout << "Allocating data in MyClass1" << endl;
data = new char[8096];
}
private:
char* data;
};
int main() {
MyClass1* obj1 = new MyClass1();
for (int i = 0; i < 10; i++) {
this_thread::sleep_for(chrono::seconds(1));
obj1->allocate();
}
return 0;
}
I am using the same heap_profile configuration in both cases
// start heap trace with continuous dumps
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ PERFETTO_SYMBOLIZER_MODE=index tools/heap_profile -n test -c 5 --print-config | out/linux_clang_release/perfetto -c - --txt -o ~/heapprofd-trace
[609.717] perfetto_cmd.cc:1026 Connected to the Perfetto traced service, starting tracing
Any reason why continuous dumps (timeline of diamonds) don't show up for a multi-thread application?
Hi, I cannot see a good reason on why this should work only on the main thread.
One thing I suspect is that there might be some compiler optimization going on. Compilers are quite smart at completely removing code that they can prove is unused.
@tkuowork can I ask you to:
1) After you do data = new char[8096];
just do a write(42, data, 8096)
. The syscall will fail but the compiler has no way to tell what the syscall will do, so should reduce the chance of optimizing away the allocation.
2) Attach logs you get from perfetto. If you build with PERFETTO_BUILDFLAG(PERFETTO_FORCE_DLOG_ON)
you should see more logging coming through. That might help.
sure!
im still getting the same result after following those steps.
trace: heapprofd-trace.zip
i've built all the targets:
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ tools/setup_all_configs.py
Building linux_gcc_debug_arm64[is_clang=false,is_debug=true,target_cpu="arm64",target_os="linux"]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_debug_arm64 '--args=is_clang=false is_debug=true target_cpu="arm64" target_os="linux"' --check
Done. Made 1193 targets from 198 files in 137ms
Building linux_gcc_release_arm64[is_clang=false,is_debug=false,target_cpu="arm64",target_os="linux"]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_release_arm64 '--args=is_clang=false is_debug=false target_cpu="arm64" target_os="linux"' --check
Done. Made 1189 targets from 198 files in 128ms
Building linux_gcc_debug [is_clang=false,is_debug=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_debug '--args=is_clang=false is_debug=true' --check
Done. Made 931 targets from 198 files in 112ms
Building linux_gcc_release [is_clang=false,is_debug=false]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_release '--args=is_clang=false is_debug=false' --check
Done. Made 930 targets from 198 files in 121ms
Building linux_clang_debug [is_clang=true,is_debug=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_clang_debug '--args=is_clang=true is_debug=true' --check
Done. Made 978 targets from 201 files in 123ms
Building linux_clang_release [is_clang=true,is_debug=false]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_clang_release '--args=is_clang=true is_debug=false' --check
Done. Made 977 targets from 201 files in 125ms
Building linux_asan [is_clang=true,is_debug=false,is_asan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_asan '--args=is_clang=true is_debug=false is_asan=true' --check
Done. Made 977 targets from 201 files in 114ms
Building linux_lsan [is_clang=true,is_debug=false,is_lsan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_lsan '--args=is_clang=true is_debug=false is_lsan=true' --check
Done. Made 977 targets from 201 files in 111ms
Building linux_msan [is_clang=true,is_debug=false,is_msan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_msan '--args=is_clang=true is_debug=false is_msan=true' --check
Done. Made 977 targets from 201 files in 118ms
Building linux_tsan [is_clang=true,is_debug=false,is_tsan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_tsan '--args=is_clang=true is_debug=false is_tsan=true' --check
Done. Made 977 targets from 201 files in 118ms
Building linux_ubsan [is_clang=true,is_debug=false,is_ubsan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_ubsan '--args=is_clang=true is_debug=false is_ubsan=true' --check
Done. Made 977 targets from 201 files in 112ms
Building linux_fuzzer [is_clang=true,is_debug=false,is_fuzzer=true,is_asan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_fuzzer '--args=is_clang=true is_debug=false is_fuzzer=true is_asan=true' --check
Done. Made 980 targets from 202 files in 113ms
then I built the debug version so i have additional loging
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ tools/ninja -C out/linux_clang_debug
ninja: Entering directory `out/linux_clang_debug'
[2822/2822] stamp obj/default.stamp
I build the new thread version with the write
and also with optimization off, with debug symbols
tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ g++ -Og threadLeaky.cpp -o test
threadLeaky.cpp: In member function ‘void MyClass1::allocate()’:
threadLeaky.cpp:16:14: warning: ignoring return value of ‘ssize_t write(int, const void*, size_t)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
16 | write(42, data, 8096);
| ~~~~~^~~~~~~~~~~~~~~~
src:
#include <iostream>
#include <chrono>
#include <thread>
#include <unistd.h>
using namespace std;
class MyClass1 {
public:
MyClass1() {
}
void allocate() {
cout << "Allocating data in MyClass1" << endl;
data = new char[8096];
write(42, data, 8096);
}
private:
char* data;
};
int main() {
MyClass1* obj1 = new MyClass1();
thread* t1 = new thread([&obj1](){
for (int i = 0; i < 10; i++) {
this_thread::sleep_for(chrono::seconds(1));
obj1->allocate();
}
});
t1->join();
return 0;
}
then I ran traced
and perfetto
// traced
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ out/linux_clang_debug/traced
[941.544] service.cc:240 Started traced, listening on /tmp/perfetto-producer /tmp/perfetto-consumer
[944.863] ing_service_impl.cc:457 Consumer 0x559aefd45d90 connected from UID 1000
[944.863] ing_service_impl.cc:557 Enabling tracing for consumer 0x559aefd44670, UUID: c7848737-3411-f1b4-d040-b0462ac483b0
[944.864] ing_service_impl.cc:988 Configured tracing session 1, #sources:2, duration:0 ms, #buffers:1, total buffer size:63488 KB, total sessions:1, uid:1000 session name: ""
[944.864] ng_service_impl.cc:1423 All data sources started
[945.000] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[950.006] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[955.005] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[956.163] ing_service_impl.cc:361 Producer 1 connected, uid=1000
[956.163] ng_service_impl.cc:2510 Producer 1 registered data source "android.heapprofd"
[956.163] ng_service_impl.cc:2745 Setting up data source android.heapprofd with target buffer 1
[956.163] ng_service_impl.cc:2771 Invalid configured SMB sizes: shm_size 0 page_size 0. Falling back to shm_size 262144 page_size 4096.
[956.163] ng_service_impl.cc:2779 Creating SMB of 256 KB for producer "android.heapprofd"
[960.010] ng_service_impl.cc:2428 Draining into file, written: 180 KB, stop: 0
[965.015] ng_service_impl.cc:2428 Draining into file, written: 331 KB, stop: 0
[970.008] ng_service_impl.cc:2428 Draining into file, written: 76 KB, stop: 0
[975.005] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[980.006] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[981.099] ng_service_impl.cc:2428 Draining into file, written: 4 KB, stop: 0
[981.118] ucer_ipc_service.cc:194 Client 1 disconnected
[981.118] ing_service_impl.cc:423 Producer 1 disconnected
[981.118] ng_service_impl.cc:2595 Producer 1 unregistered data source "android.heapprofd"
[981.118] ing_service_impl.cc:474 Consumer 0x559aefd44670 disconnected
[981.118] ng_service_impl.cc:2434 Freeing buffers for session 1
[981.118] ng_service_impl.cc:2466 Tracing session 1 ended, total sessions:0
// perfetto
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ tools/heap_profile -n test -c 5 --print-config | out/linux_clang_debug/perfetto -c - --txt -o ~/heapprofd-trace
[944.858] perfetto_cmd.cc:594 Parsing TraceConfig, 494 bytes
[944.863] perfetto_cmd.cc:1026 Connected to the Perfetto traced service, starting tracing
// test app
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ LD_PRELOAD=out/linux_clang_debug/libheapprofd_glibc_preload.so ../test/test
[956.163] approfd_producer.cc:245 Connected to the service, mode [child].
[956.164] approfd_producer.cc:367 Setting up datasource: non-statsd initiator.
[956.164] approfd_producer.cc:451 Set up data source.
[956.164] approfd_producer.cc:515 Starting data source 1
[956.164] approfd_producer.cc:558 Started DataSource
[956.164] approfd_producer.cc:851 84899: Received FDs.
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
[966.163] ctory_standalone.cc:161 Child disconnected.
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ [966.165] pprofd_producer.cc:1192 84899 disconnected from heapprofd (ds shutting down: 0).
[966.263] unwinding.cc:329 Unexpected data.
[981.098] approfd_producer.cc:575 Stopping data source 1
[981.098] approfd_producer.cc:341 Shutting down child heapprofd (status 0).
Do you have any other suggestions of things I should test? It does seem somewhat specific to using threads
Do you have any other suggestions of things I should test? It does seem somewhat specific to using threads
I managed to reproduce this. I think this is a bug in our glibc malloc wrapper, proposed fix here: https://r.android.com/2519482
Thanks!
@ddiproietto thank you! hoping to see this land soon.
i think i also hit the deadlock case
#0 0x00007f462503a4d6 in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f462503e959 in pthread_getattr_np () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#2 0x00007f462a54aa6b in perfetto::profiling::GetThreadStackRange () at ../../src/profiling/memory/client.cc:86
#3 0x00007f462a54b61c in perfetto::profiling::Client::GetStackEnd (this=this@entry=0x7f45d8020188, stackptr=stackptr@entry=0x7f458e7f3010 "\260\060\177\216E\177") at ../../src/profiling/memory/client.cc:286
#4 0x00007f462a54b86e in perfetto::profiling::Client::RecordMalloc (this=0x7f45d8020188, heap_id=1, sample_size=4096, alloc_size=152, alloc_address=139937752257680) at ../../src/profiling/memory/client.cc:372
#5 0x00007f462a5b06a0 in AHeapProfile_reportAllocation (heap_id=1, id=<optimized out>, size=152) at ../../src/profiling/memory/client_api.cc:422
#6 0x00007f462a543db4 in perfetto::profiling::wrap_calloc (heap_id=1, fn=<optimized out>, nmemb=152, size=<optimized out>) at ../../src/profiling/memory/wrap_allocators.cc:47
--Type <RET> for more, q to quit, c to continue without paging--
#7 0x00007f462503b313 in ?? () from /lib64/libc.so.6
#8 0x00007f462503b5f8 in pthread_attr_setaffinity_np () from /lib64/libc.so.6
#9 0x00007f462503e8be in pthread_getattr_np () from /lib64/libc.so.6
#10 0x00007f462a54aa6b in perfetto::profiling::GetThreadStackRange () at ../../src/profiling/memory/client.cc:86
#11 0x00007f462a54b61c in perfetto::profiling::Client::GetStackEnd (this=this@entry=0x7f45d8020188, stackptr=stackptr@entry=0x7f458e7f3460 "") at ../../src/profiling/memory/client.cc:286
--Type <RET> for more, q to quit, c to continue without paging--
#12 0x00007f462a54b86e in perfetto::profiling::Client::RecordMalloc (this=0x7f45d8020188, heap_id=1, sample_size=4096, alloc_size=34, alloc_address=139937752155888) at ../../src/profiling/memory/client.cc:372
#13 0x00007f462a5b06a0 in AHeapProfile_reportAllocation (heap_id=1, id=<optimized out>, size=34) at ../../src/profiling/memory/client_api.cc:422
@ddiproietto i cherry-picked your change and am not longer getting a deadlock. thanks!! excited for this change to get in
@ddiproietto i also tested with the multi-threaded test app and am seeing the multiple diamonds now too!
out of curiosity, there any plans for the UI to be able to show memory allocated per thread. i.e currently it shows all memory for a single process, but i assume there is enough meta data to know that there were two threads. there could be drop down menu for each thread
closing since its on mainline. thanks for the fix!
Hi.
I'm enjoying using this tool to debug memory leaks.
Issue
I am currently running into an issue with the Profetto UI not showing any diamond icons for a multi-threaded c++ app, even after adding "adding -c 5000 to the invocation of tools/heap_profile" as referenced in https://perfetto.dev/docs/data-sources/native-heap-profiler#continuous-dumps
Everything works great when i'm not using
std::thread
orpthread_t
.This is what I see on the Profetto UI when using threads and enabling continuous dumps:
It seems like I'm getting an accurate single snap shot with no diamond icon.
Here is the trace as a zip: heapprofd-trace.zip
Reproducing the issue
source code
Here is the source code (which intentionally leaks memory):
Following the instructions in: https://perfetto.dev/docs/data-sources/native-heap-profiler#-non-android-linux-support
Here is my setup to produce the
heapprofd-trace
:environment
Starting traced
Starting profetto
Building application
Running application
Any ideas if this a bug or by design?