iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.36k stars 3.86k forks source link

tools/memleak: Fix the data error caused by the same key in map #4970

Closed AshinZ closed 4 months ago

AshinZ commented 5 months ago

Followings are the way to generate data error issue and the result after applying this patch.

File test.cpp

  #include <iostream>
  #include <thread>
  #include <unistd.h>

  void alloc() {
    for (int i = 0; i < 100000; ++i) {
      int* a = (int*)malloc(4);
    }
  }

  int main() {
    sleep(100);
    std::thread t1 {&alloc};
    std::thread t2 {&alloc};
    t1.join();
    t2.join();
    sleep(50);
    return 0;
  }

Build the test file

  $ g++ -g -o test -lpthread test.cpp

Run this with --combined-only:

sudo ./memleak.py -c ./test --combined-only
Executing './test' and tracing the resulting process. Attaching to pid 194273, Ctrl+C to quit.
[23:36:43] Top 10 stacks with outstanding allocations:
        576 bytes in 2 allocations from stack
                __GI__dl_allocate_tls+0x2c [ld-2.28.so]
        799992 bytes in 199998 allocations from stack
                alloc()+0x22 [test]
                void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)())+0x1d [test]
                std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)())+0x20 [test]
                decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x28 [test]
                std::thread::_Invoker<std::tuple<void (*)()> >::operator()()+0x18 [test]
                std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run()+0x1c [test]
                [unknown] [libstdc++.so.6.0.25]
        8392704 bytes in 1 allocations from stack
                pthread_create+0x893 [libpthread-2.28.so]
                std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())+0x19 [libstdc++.so.6.0.25]
                main+0x49 [test]
                __libc_start_main+0xf3 [libc-2.28.so]
                [unknown]
        8392704 bytes in 1 allocations from stack
                pthread_create+0x893 [libpthread-2.28.so]
                std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())+0x19 [libstdc++.so.6.0.25]
                main+0x2e [test]
                __libc_start_main+0xf3 [libc-2.28.so]
                [unknown]
        134217728 bytes in 1 allocations from stack
                new_heap+0xa7 [libc-2.28.so]

Problem: We can see 799992 bytes alloced in 199998 allocations from stack while when we read the code, we will find that there should be 800000 bytes alloced in 200000 allocations.

Reason: After Tracing, we can find that malloc() may call mmap(), so there will be two continuous call of gen_alloc_enter(struct pt_regs *ctx, size_t size) in a same process. For example, in our test process, when it first call int* a = (int*)malloc(4);, gen_alloc_enter(struct pt_regs *ctx, size_t size) will be called, and there will be a pair of data <tid, 4> in BPF_HASH sizes; then a mmap() will be called, also gen_alloc_enter(struct pt_regs *ctx, size_t size) be called, which make <tid, 4> change into <tid, MMAP_SIZE>. This will make the call of gen_alloc_exit() for the first malloc() will return early because the tid key will be deleted after the call of gen_alloc_exit() caused by mmap(), which finally cause data error.

The callchain: malloc()->gen_alloc_enter()->mmap()->gen_alloc_enter()->mmap_return()->gen_alloc_exit()->malloc_return()->gen_alloc_exit();

Solution: We can add type_index to help distinguish calling sources and reduce key conflicts.

After Applying this patch, run memleak.py with --combined-only:

sudo ./memleak.py -c ./test --combined-only
Executing './test' and tracing the resulting process. Attaching to pid 194659, Ctrl+C to quit.
[23:37:16] Top 10 stacks with outstanding allocations:
        576 bytes in 2 allocations from stack
                __GI__dl_allocate_tls+0x2c [ld-2.28.so]
        800000 bytes in 200000 allocations from stack
                alloc()+0x22 [test]
                void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)())+0x1d [test]
                std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)())+0x20 [test]
                decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x28 [test]
                std::thread::_Invoker<std::tuple<void (*)()> >::operator()()+0x18 [test]
                std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run()+0x1c [test]
                [unknown] [libstdc++.so.6.0.25]
        8392704 bytes in 1 allocations from stack
                pthread_create+0x893 [libpthread-2.28.so]
                std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())+0x19 [libstdc++.so.6.0.25]
                main+0x49 [test]
                __libc_start_main+0xf3 [libc-2.28.so]
                [unknown]
        8392704 bytes in 1 allocations from stack
                pthread_create+0x893 [libpthread-2.28.so]
                std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())+0x19 [libstdc++.so.6.0.25]
                main+0x2e [test]
                __libc_start_main+0xf3 [libc-2.28.so]
                [unknown]
        134217728 bytes in 1 allocations from stack
                new_heap+0xa7 [libc-2.28.so]