google / perfetto

Performance instrumentation and tracing for Android, Linux and Chrome (read-only mirror of https://android.googlesource.com/platform/external/perfetto/)
https://www.perfetto.dev
Apache License 2.0
2.83k stars 354 forks source link

[Profetto UI] Heap profiler with continuous dumps for a multi-threaded c++ app doesnt show diamonds icon #473

Closed tkuowork closed 1 year ago

tkuowork commented 1 year ago

Hi.

I'm enjoying using this tool to debug memory leaks.

Issue

I am currently running into an issue with the Profetto UI not showing any diamond icons for a multi-threaded c++ app, even after adding "adding -c 5000 to the invocation of tools/heap_profile" as referenced in https://perfetto.dev/docs/data-sources/native-heap-profiler#continuous-dumps

Everything works great when i'm not using std::thread or pthread_t.

This is what I see on the Profetto UI when using threads and enabling continuous dumps:

image

It seems like I'm getting an accurate single snap shot with no diamond icon.

Here is the trace as a zip: heapprofd-trace.zip

Reproducing the issue

source code

Here is the source code (which intentionally leaks memory):

#include <iostream>
#include <chrono>
#include <pthread.h>
#include <unistd.h>

using namespace std;

class MyClass1 {
public:
    MyClass1() {
    }

    void allocate() {
        cout << "Allocating 1kb of data in MyClass1" << endl;
        data = new char[8096];
    }

private:
    char* data;
};

class MyClass2 {
public:
    MyClass2() {
    }

    void allocate() {
        cout << "Allocating 1kb of data in MyClass2" << endl;
        data = new char[8096];
    }

private:
    char* data;
};

void* thread1_function(void* arg) {
    MyClass1* obj1 = static_cast<MyClass1*>(arg);

    for (int i = 0; i < 10; i++) {
        sleep(1);
        obj1->allocate();
    }

    pthread_exit(nullptr);
}

void* thread2_function(void* arg) {
    MyClass2* obj2 = static_cast<MyClass2*>(arg);

    for (int i = 0; i < 10; i++) {
        sleep(2);
        obj2->allocate();
    }

    pthread_exit(nullptr);
}

int main() {
    pthread_t thread1;
    pthread_t thread2;
    MyClass1* obj1 = new MyClass1();
    MyClass2* obj2 = new MyClass2();

    pthread_create(&thread1, nullptr, thread1_function, obj1);
    pthread_create(&thread2, nullptr, thread2_function, obj2);

    pthread_join(thread1, nullptr);
    pthread_join(thread2, nullptr);

    return 0;
}

Following the instructions in: https://perfetto.dev/docs/data-sources/native-heap-profiler#-non-android-linux-support

Here is my setup to produce the heapprofd-trace:

environment

tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ uname -ar
Linux tkuo-HP-Z840-Workstation 5.15.0-48-generic #54-Ubuntu SMP Fri Aug 26 13:26:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

Starting traced

tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ out/linux/traced
[125.236]          service.cc:240 Started traced, listening on /tmp/perfetto-producer /tmp/perfetto-consumer

Starting profetto

tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ PERFETTO_SYMBOLIZER_MODE=index tools/heap_profile -n test -c 500 --print-config  | out/linux_clang_release/perfetto   -c - --txt   -o ~/heapprofd-trace

Building application

tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ g++ -g posixThread.cpp -o test

Running application

tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ LD_PRELOAD=out/linux_clang_release/libheapprofd_glibc_preload.so ../test/test

Any ideas if this a bug or by design?

primiano commented 1 year ago

If you see the heap dump the diamond must be there. Are you really sure that your browser window doesn't extend beyond your viewport, making the right hand side hidden?

From the screenshot looks like it's what's happening here. I have no other explanation

primiano commented 1 year ago

Ah no actually I can repro myself.

@stevegolton can you take a look? This might be some silly UI bug when computing the trace bounds or similar. If the snapshot loads in the bottom panel, the diamond (the profile) must be there (in the tables).

tkuowork commented 1 year ago

thanks for the quick response. glad you were able to repro.

looking forward to the fix!

stevegolton commented 1 year ago

Hey @tkuowork, thanks for raising the issue and for providing so much detail!

I noticed that the timestamps in your attached trace are quite large ~185 days, and all heap profiles have the same timestamp, resulting in an effective trace "duration" of 1ns.

The UI has a limitation where timestamps start to lose precision after ~100 days, which causes issues when the trace is so short as the start and end times cannot be resolved, which breaks timeline rendering.

We are actually currently working on a solution to this problem, but it's complex and might take some time. As a temporary fix, you could try restarting your machine you recorded the traces on and re-recording them.

tkuowork commented 1 year ago

@stevegolton thanks! the ~185 day timestamp makes sense because thats how long my machine has been up.

I restarted my machine and now i do see a diamond!

image

here is the heap-trace for that view zipped:
thread-heapprofd-trace.zip

however, I am curious as to why I only see one diamond. I would have expected to see multiple diamonds since I am using continuous dumps.

here is the source code for the heap trace above and where I see one diamond:

#include <iostream>
#include <chrono>
#include <thread>

using namespace std;

class MyClass1 {
public:
    MyClass1() {
    }

    void allocate() {
        cout << "Allocating data in MyClass1" << endl;
        data = new char[8096];
    }

private:
    char* data;
};

int main() {
    MyClass1* obj1 = new MyClass1();

    thread* t1 = new thread([&obj1](){
        for (int i = 0; i < 10; i++) {
            this_thread::sleep_for(chrono::seconds(1));
            obj1->allocate();
        }
    });

    t1->join();

    return 0;
}

here are the commands I ran:

// start perfetto
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ out/linux/traced
[154.346]          service.cc:240 Started traced, listening on /tmp/perfetto-producer /tmp/perfetto-consumer

//  start heap trace with continuous dumps
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ PERFETTO_SYMBOLIZER_MODE=index tools/heap_profile -n test -c 5 --print-config  | out/linux_clang_release/perfetto   -c - --txt   -o ~/heapprofd-trace
[609.717]    perfetto_cmd.cc:1026 Connected to the Perfetto traced service, starting tracing

// build code
tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ g++ -g nonthreadLeaky.cpp -o test

// run code
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ LD_PRELOAD=out/linux_clang_release/libheapprofd_glibc_preload.so ../test/test

when I run the same code OUTSIDE a pthread or cpp::thread I get this really nice graph with a timeline, which is what i am hoping to see when my application is threaded (or multi-threaded)

image

I am able to see memory being allocated every second just like what the code is doing !

here is the corresponding trace zipped : heapprofd-trace.zip

here is the source code

#include <iostream>
#include <chrono>
#include <thread>

using namespace std;

class MyClass1 {
public:
    MyClass1() {
    }

    void allocate() {
        cout << "Allocating data in MyClass1" << endl;
        data = new char[8096];
    }

private:
    char* data;
};

int main() {
    MyClass1* obj1 = new MyClass1();

    for (int i = 0; i < 10; i++) {
        this_thread::sleep_for(chrono::seconds(1));
        obj1->allocate();
    }

    return 0;
}

I am using the same heap_profile configuration in both cases

//  start heap trace with continuous dumps
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ PERFETTO_SYMBOLIZER_MODE=index tools/heap_profile -n test -c 5 --print-config  | out/linux_clang_release/perfetto   -c - --txt   -o ~/heapprofd-trace
[609.717]    perfetto_cmd.cc:1026 Connected to the Perfetto traced service, starting tracing

Any reason why continuous dumps (timeline of diamonds) don't show up for a multi-thread application?

primiano commented 1 year ago

Hi, I cannot see a good reason on why this should work only on the main thread.

One thing I suspect is that there might be some compiler optimization going on. Compilers are quite smart at completely removing code that they can prove is unused.

@tkuowork can I ask you to:

1) After you do data = new char[8096]; just do a write(42, data, 8096). The syscall will fail but the compiler has no way to tell what the syscall will do, so should reduce the chance of optimizing away the allocation.

2) Attach logs you get from perfetto. If you build with PERFETTO_BUILDFLAG(PERFETTO_FORCE_DLOG_ON) you should see more logging coming through. That might help.

tkuowork commented 1 year ago

sure!

im still getting the same result after following those steps.

image

trace: heapprofd-trace.zip

i've built all the targets:

tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ tools/setup_all_configs.py

Building linux_gcc_debug_arm64[is_clang=false,is_debug=true,target_cpu="arm64",target_os="linux"]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_debug_arm64 '--args=is_clang=false is_debug=true target_cpu="arm64" target_os="linux"' --check
Done. Made 1193 targets from 198 files in 137ms

Building linux_gcc_release_arm64[is_clang=false,is_debug=false,target_cpu="arm64",target_os="linux"]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_release_arm64 '--args=is_clang=false is_debug=false target_cpu="arm64" target_os="linux"' --check
Done. Made 1189 targets from 198 files in 128ms

Building linux_gcc_debug     [is_clang=false,is_debug=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_debug '--args=is_clang=false is_debug=true' --check
Done. Made 931 targets from 198 files in 112ms

Building linux_gcc_release   [is_clang=false,is_debug=false]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_gcc_release '--args=is_clang=false is_debug=false' --check
Done. Made 930 targets from 198 files in 121ms

Building linux_clang_debug   [is_clang=true,is_debug=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_clang_debug '--args=is_clang=true is_debug=true' --check
Done. Made 978 targets from 201 files in 123ms

Building linux_clang_release [is_clang=true,is_debug=false]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_clang_release '--args=is_clang=true is_debug=false' --check
Done. Made 977 targets from 201 files in 125ms

Building linux_asan          [is_clang=true,is_debug=false,is_asan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_asan '--args=is_clang=true is_debug=false is_asan=true' --check
Done. Made 977 targets from 201 files in 114ms

Building linux_lsan          [is_clang=true,is_debug=false,is_lsan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_lsan '--args=is_clang=true is_debug=false is_lsan=true' --check
Done. Made 977 targets from 201 files in 111ms

Building linux_msan          [is_clang=true,is_debug=false,is_msan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_msan '--args=is_clang=true is_debug=false is_msan=true' --check
Done. Made 977 targets from 201 files in 118ms

Building linux_tsan          [is_clang=true,is_debug=false,is_tsan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_tsan '--args=is_clang=true is_debug=false is_tsan=true' --check
Done. Made 977 targets from 201 files in 118ms

Building linux_ubsan         [is_clang=true,is_debug=false,is_ubsan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_ubsan '--args=is_clang=true is_debug=false is_ubsan=true' --check
Done. Made 977 targets from 201 files in 112ms

Building linux_fuzzer        [is_clang=true,is_debug=false,is_fuzzer=true,is_asan=true]
/home/tkuo/Documents/perfetto/tools/gn gen /home/tkuo/Documents/perfetto/out/linux_fuzzer '--args=is_clang=true is_debug=false is_fuzzer=true is_asan=true' --check
Done. Made 980 targets from 202 files in 113ms

then I built the debug version so i have additional loging

tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ tools/ninja -C out/linux_clang_debug
ninja: Entering directory `out/linux_clang_debug'
[2822/2822] stamp obj/default.stamp

I build the new thread version with the write and also with optimization off, with debug symbols

tkuo@tkuo-HP-Z840-Workstation:~/Documents/test$ g++ -Og threadLeaky.cpp -o test
threadLeaky.cpp: In member function ‘void MyClass1::allocate()’:
threadLeaky.cpp:16:14: warning: ignoring return value of ‘ssize_t write(int, const void*, size_t)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   16 |         write(42, data, 8096);
      |         ~~~~~^~~~~~~~~~~~~~~~

src:

#include <iostream>
#include <chrono>
#include <thread>
#include <unistd.h>

using namespace std;

class MyClass1 {
public:
    MyClass1() {
    }

    void allocate() {
        cout << "Allocating data in MyClass1" << endl;
        data = new char[8096];
        write(42, data, 8096);
    }

private:
    char* data;
};

int main() {
    MyClass1* obj1 = new MyClass1();

    thread* t1 = new thread([&obj1](){
        for (int i = 0; i < 10; i++) {
            this_thread::sleep_for(chrono::seconds(1));
            obj1->allocate();
        }
    });

    t1->join();

    return 0;
}

then I ran traced and perfetto

// traced
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ out/linux_clang_debug/traced
[941.544]          service.cc:240 Started traced, listening on /tmp/perfetto-producer /tmp/perfetto-consumer

[944.863] ing_service_impl.cc:457 Consumer 0x559aefd45d90 connected from UID 1000
[944.863] ing_service_impl.cc:557 Enabling tracing for consumer 0x559aefd44670, UUID: c7848737-3411-f1b4-d040-b0462ac483b0
[944.864] ing_service_impl.cc:988 Configured tracing session 1, #sources:2, duration:0 ms, #buffers:1, total buffer size:63488 KB, total sessions:1, uid:1000 session name: ""
[944.864] ng_service_impl.cc:1423 All data sources started
[945.000] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[950.006] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[955.005] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[956.163] ing_service_impl.cc:361 Producer 1 connected, uid=1000
[956.163] ng_service_impl.cc:2510 Producer 1 registered data source "android.heapprofd"
[956.163] ng_service_impl.cc:2745 Setting up data source android.heapprofd with target buffer 1
[956.163] ng_service_impl.cc:2771 Invalid configured SMB sizes: shm_size 0 page_size 0. Falling back to shm_size 262144 page_size 4096.
[956.163] ng_service_impl.cc:2779 Creating SMB of 256 KB for producer "android.heapprofd"
[960.010] ng_service_impl.cc:2428 Draining into file, written: 180 KB, stop: 0
[965.015] ng_service_impl.cc:2428 Draining into file, written: 331 KB, stop: 0
[970.008] ng_service_impl.cc:2428 Draining into file, written: 76 KB, stop: 0
[975.005] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[980.006] ng_service_impl.cc:2428 Draining into file, written: 1 KB, stop: 0
[981.099] ng_service_impl.cc:2428 Draining into file, written: 4 KB, stop: 0
[981.118] ucer_ipc_service.cc:194 Client 1 disconnected
[981.118] ing_service_impl.cc:423 Producer 1 disconnected
[981.118] ng_service_impl.cc:2595 Producer 1 unregistered data source "android.heapprofd"
[981.118] ing_service_impl.cc:474 Consumer 0x559aefd44670 disconnected
[981.118] ng_service_impl.cc:2434 Freeing buffers for session 1
[981.118] ng_service_impl.cc:2466 Tracing session 1 ended, total sessions:0
// perfetto
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ tools/heap_profile -n test -c 5 --print-config | out/linux_clang_debug/perfetto -c - --txt -o ~/heapprofd-trace
[944.858]     perfetto_cmd.cc:594 Parsing TraceConfig, 494 bytes
[944.863]    perfetto_cmd.cc:1026 Connected to the Perfetto traced service, starting tracing
// test app
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ LD_PRELOAD=out/linux_clang_debug/libheapprofd_glibc_preload.so ../test/test
[956.163] approfd_producer.cc:245 Connected to the service, mode [child].
[956.164] approfd_producer.cc:367 Setting up datasource: non-statsd initiator.
[956.164] approfd_producer.cc:451 Set up data source.
[956.164] approfd_producer.cc:515 Starting data source 1
[956.164] approfd_producer.cc:558 Started DataSource
[956.164] approfd_producer.cc:851 84899: Received FDs.
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
Allocating data in MyClass1
[966.163] ctory_standalone.cc:161 Child disconnected.
tkuo@tkuo-HP-Z840-Workstation:~/Documents/perfetto$ [966.165] pprofd_producer.cc:1192 84899 disconnected from heapprofd (ds shutting down: 0).
[966.263]        unwinding.cc:329 Unexpected data.
[981.098] approfd_producer.cc:575 Stopping data source 1
[981.098] approfd_producer.cc:341 Shutting down child heapprofd (status 0).

Do you have any other suggestions of things I should test? It does seem somewhat specific to using threads

ddiproietto commented 1 year ago

Do you have any other suggestions of things I should test? It does seem somewhat specific to using threads

I managed to reproduce this. I think this is a bug in our glibc malloc wrapper, proposed fix here: https://r.android.com/2519482

Thanks!

tkuowork commented 1 year ago

@ddiproietto thank you! hoping to see this land soon.

i think i also hit the deadlock case

#0  0x00007f462503a4d6 in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f462503e959 in pthread_getattr_np () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x00007f462a54aa6b in perfetto::profiling::GetThreadStackRange () at ../../src/profiling/memory/client.cc:86
#3  0x00007f462a54b61c in perfetto::profiling::Client::GetStackEnd (this=this@entry=0x7f45d8020188, stackptr=stackptr@entry=0x7f458e7f3010 "\260\060\177\216E\177") at ../../src/profiling/memory/client.cc:286
#4  0x00007f462a54b86e in perfetto::profiling::Client::RecordMalloc (this=0x7f45d8020188, heap_id=1, sample_size=4096, alloc_size=152, alloc_address=139937752257680) at ../../src/profiling/memory/client.cc:372
#5  0x00007f462a5b06a0 in AHeapProfile_reportAllocation (heap_id=1, id=<optimized out>, size=152) at ../../src/profiling/memory/client_api.cc:422
#6  0x00007f462a543db4 in perfetto::profiling::wrap_calloc (heap_id=1, fn=<optimized out>, nmemb=152, size=<optimized out>) at ../../src/profiling/memory/wrap_allocators.cc:47
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00007f462503b313 in ?? () from /lib64/libc.so.6
#8  0x00007f462503b5f8 in pthread_attr_setaffinity_np () from /lib64/libc.so.6
#9  0x00007f462503e8be in pthread_getattr_np () from /lib64/libc.so.6
#10 0x00007f462a54aa6b in perfetto::profiling::GetThreadStackRange () at ../../src/profiling/memory/client.cc:86
#11 0x00007f462a54b61c in perfetto::profiling::Client::GetStackEnd (this=this@entry=0x7f45d8020188, stackptr=stackptr@entry=0x7f458e7f3460 "") at ../../src/profiling/memory/client.cc:286
--Type <RET> for more, q to quit, c to continue without paging--
#12 0x00007f462a54b86e in perfetto::profiling::Client::RecordMalloc (this=0x7f45d8020188, heap_id=1, sample_size=4096, alloc_size=34, alloc_address=139937752155888) at ../../src/profiling/memory/client.cc:372
#13 0x00007f462a5b06a0 in AHeapProfile_reportAllocation (heap_id=1, id=<optimized out>, size=34) at ../../src/profiling/memory/client_api.cc:422
tkuowork commented 1 year ago

@ddiproietto i cherry-picked your change and am not longer getting a deadlock. thanks!! excited for this change to get in

tkuowork commented 1 year ago

@ddiproietto i also tested with the multi-threaded test app and am seeing the multiple diamonds now too!

image

out of curiosity, there any plans for the UI to be able to show memory allocated per thread. i.e currently it shows all memory for a single process, but i assume there is enough meta data to know that there were two threads. there could be drop down menu for each thread

tkuowork commented 1 year ago

closing since its on mainline. thanks for the fix!