yanqi27 / core_analyzer

A power tool to debug memory-related issues
376 stars 72 forks source link

Failed to extract heap metadata from gv mp_ #100

Closed pdvian closed 3 months ago

pdvian commented 3 months ago

Describe the bug Hi Michael,

I am trying to get heap commands (from core_analyzer) working for one of our applications (which is built using tcmalloc) coredump on ppc64le arch .   Tried gdb-12.1 with tcmalloc (2.6.3) support but heap command reporting "failed to init heap". Configure command used for gdb-12.1 with tcmalloc build : $PWD/../configure --with-python=/usr/bin/python3.6m --with-lzma-prefix=/usr/include CFLAGS='-g -I/usr/include' LDFLAGS='-L/usr/lib64' --with-separate-debug-dir=/usr/lib/debug --prefix=/usr

[root@191037a305b9 /]# gdb /usr/bin/dummy.bin /0730-mycoredump.3801385 
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
--Type  for more, q to quit, c to continue without paging--
Core was generated by `dummy '.
#0  0x00007fff8e97264c in futex_wait_cancelable (private=0, expected=0, futex_word=0x12dae10f8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88  ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
[Current thread is 1 (LWP 3801385)]
(gdb) heap
Failed to extract heap metadata from gv mp_
==================================================================================
== The memory manager is assumed to be glibc 2.28                                ==
== If this is not true, please debug with another machine with matching glibc   ==
==================================================================================
failed to init heap
(gdb) p &mp_
$1 = (struct malloc_par *) 0x7fff8e430168 
(gdb) p mp_
$2 = {trim_threshold = 131072, top_pad = 131072, mmap_threshold = 131072, arena_test = 8, arena_max = 0, n_mmaps = 0, n_mmaps_max = 65536, max_n_mmaps = 0, no_dyn_threshold = 0, mmapped_mem = 0, max_mmapped_mem = 0, sbrk_base = 0x0, 
  tcache_bins = 64, tcache_max_bytes = 1032, tcache_count = 7, tcache_unsorted_limit = 0}
(gdb) ptype mp_
type = struct malloc_par {
    unsigned long trim_threshold;
    size_t top_pad;
    size_t mmap_threshold;
    size_t arena_test;
    size_t arena_max;
    int n_mmaps;
    int n_mmaps_max;
    int max_n_mmaps;
    int no_dyn_threshold;
    size_t mmapped_mem;
    size_t max_mmapped_mem;
    char *sbrk_base;
    size_t tcache_bins;
    size_t tcache_max_bytes;
    size_t tcache_count;
    size_t tcache_unsorted_limit;
}
(gdb) info shared
From                To                  Syms Read   Shared Object Library
0x00007fff8f15a080  0x00007fff8f1a7ec4  Yes (*)     /lib64/libblkid.so.1
0x00007fff8f0e78a0  0x00007fff8f10d8e8  Yes (*)     /lib64/libfuse.so.2
0x00007fff8f0b08e0  0x00007fff8f0b143c  Yes (*)     /lib64/libaio.so.1
0x00007fff8f024420  0x00007fff8f06e584  Yes (*)     /lib64/libleveldb.so.1
0x00007fff8efe24a0  0x00007fff8efe8794  Yes (*)     /lib64/libsnappy.so.1
0x00007fff8ef91f00  0x00007fff8efaf3e0  Yes (*)     /lib64/liblz4.so.1
0x00007fff8ef52240  0x00007fff8ef64df4  Yes (*)     /lib64/libz.so.1
0x00007fff8edb48a0  0x00007fff8edec4c4  Yes         /lib64/libtcmalloc.so.4
0x00007fff8ed63520  0x00007fff8ed73ca4  Yes         /lib64/libresolv.so.2
0x00007fff8ed30ce0  0x00007fff8ed32420  Yes         /lib64/libdl.so.2
0x00007fff8ea2a000  0x00007fff8ec34d54  Yes (*)     /lib64/libcrypto.so.1.1
0x00007fff8e9658e0  0x00007fff8e97d068  Yes         /lib64/glibc-hwcaps/power9/libpthread-2.28.so
0x00007fff8e87bda0  0x00007fff8e9103cc  Yes (*)     /lib64/libudev.so.1
0x00007fff8e8262c0  0x00007fff8e83f348  Yes (*)     /lib64/libibverbs.so.1
0x00007fff8e7e38c0  0x00007fff8e7faa90  Yes (*)     /lib64/librdmacm.so.1
0x00007fff8e646b80  0x00007fff8e76a208  Yes (*)     /lib64/libstdc++.so.6
0x00007fff8e48d740  0x00007fff8e5193d0  Yes         /lib64/glibc-hwcaps/power9/libm-2.28.so
0x00007fff8e442c20  0x00007fff8e450800  Yes (*)     /lib64/libgcc_s.so.1
0x00007fff8e253780  0x00007fff8e3c4adc  Yes         /lib64/glibc-hwcaps/power9/libc-2.28.so
0x00007fff8f211380  0x00007fff8f23c7d8  Yes         /lib64/ld64.so.2
0x00007fff8e201480  0x00007fff8e2075e0  Yes (*)     /lib64/libuuid.so.1
0x00007fff8e16b900  0x00007fff8e1c3474  Yes (*)     /lib64/libmount.so.1
0x00007fff8e0ae380  0x00007fff8e116fb8  Yes (*)     /lib64/libnl-route-3.so.200
0x00007fff8e048020  0x00007fff8e05fe50  Yes (*)     /lib64/libnl-3.so.200
0x00007fff8dfe65c0  0x00007fff8e00f7d0  Yes (*)     /lib64/libselinux.so.1
0x00007fff8dfb1aa0  0x00007fff8dfb6a28  Yes         /lib64/glibc-hwcaps/power9/librt-2.28.so
0x00007fff8df01ec0  0x00007fff8df661e8  Yes (*)     /lib64/libpcre2-8.so.0
0x00007fff8dea14e0  0x00007fff8dea91fc  Yes (*)     /lib64/liblttng-ust-tracepoint.so.0
0x00007fff8de71d40  0x00007fff8de77274  Yes (*)     /lib64/liburcu-bp.so.6
0x00007fff8de41c00  0x00007fff8de48db4  Yes (*)     /lib64/liburcu-cds.so.6
0x00007fff8de11240  0x00007fff8de13984  Yes (*)     /lib64/liburcu-common.so.6
0x00007fff8d5d15c0  0x00007fff8d5db970  Yes (*)     /lib64/libnss_sss.so.2
0x00007fff8d5a2340  0x00007fff8d5ad3c8  Yes         /lib64/libnss_files.so.2
(*): Shared library is missing debugging information.
(gdb)

The gdb 12.1 was built with tcmalloc support :

# vim /core_analyzer/build/gdb-12.1/gdb/Makefile.in 
...
        heap.c \
        heap_ptmalloc_common.c \
        heap_ptmalloc_2_27.c \
        heap_ptmalloc_2_31.c \
        heap_ptmalloc_2_35.c \
        heap_tcmalloc.c \
        heap_jemalloc.c \
        heapcmd.c \
        i386-decode.c \
...

but something is not right with my gdb12.1+tcmalloc build as heap command for simple c++ program coredump which is built with tcmalloc does not work whereas same program built without tcmalloc works perfectly fine:

[root@9b2d1dfbc611 ~]# gdb ./app.without_tcmalloc.bin /core.app.1144663 
GNU gdb (GDB) 12.1
...
Core was generated by `./app.without_tcmalloc.bin'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000000000400d77 in main () at hesw.cc:40
(gdb) heap
    Tuning params & stats:
        mmap_threshold=131072
        pagesize=4096
        n_mmaps=0
        n_mmaps_max=65536
        total mmap regions created=0
        mmapped_mem=0
        sbrk_base=0x1756000
    Main arena (0x7f30822d5bc0) owns regions:
        [0x1756010 - 0x1dbd000] Total 6MB in-use 141(6MB) free 1(32KB)

    There are 1 arenas Total 6MB
    Total 141 blocks in-use of 6MB
    Total 1 blocks free of 32KB

I tried the --with-separate-debug-dir configure option as suggested in https://github.com/yanqi27/core_analyzer/issues/26#issuecomment-800364061 but no luck so far.

Any suggestions ?

Thanks in advance.

To Reproduce

Expected behavior

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context glibc and gperftools version matches with coredump environment.

Celthi commented 3 months ago

Could you try to use the switch_heap command to switch the heap tc malloc?

pdvian commented 3 months ago

Could you try to use the switch_heap command to switch the heap tc malloc?

Thanks @Celthi. I switched to tcmalloc but tc heap manager is failing to initialize :

(gdb) switch_heap tc      
switch to heap tc
Failed to lookup gv "kPageShift"
(gdb) heap                
tcmalloc heap was not initialized successfully
[Error] Failed to walk heap
(gdb)

I am looking into it further.

pdvian commented 3 months ago

I am still having trouble with the heap manager on ppc64le arch but managed to get the heap commands working for a coredump from x86_64 arch (tried it on x86_64 arch for verification purpose because we were seeing similar problems compared to ppc64le). The tcmalloc heap manager was failing to initialize because of the missing gperftools-devel package on x86_64. The core analyzer on the ppc64le debug environment does have the correct gperftools-devel ppc64le package installed but failing to lookup for type TCMalloc_PageMap2<35>::Leaf now.

gdb 9.2 - x86_64 

[root@cdd4337225e4 core_analyzer]# gdb /usr/bin/app /core.1709365
GNU gdb (GDB) 9.2
...
Core was generated by `/usr/bin/app'.
Program terminated with signal SIGABRT, Aborted.
#0  raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f5d0b792200 (LWP 6))]
(gdb) heap /tb 5
Top 5 biggest in-use heap memory blocks:
    addr=0x55dafdd5a000  size=33554432 (32MB)
    addr=0x55daffd5a000  size=33554432 (32MB)
    addr=0x55db01d5a000  size=33554432 (32MB)
    addr=0x55db03d74000  size=33554432 (32MB)
    addr=0x55db05f18000  size=33554432 (32MB)
(gdb) ref 0x55daffd5a000
Search for object type associated with 0x55daffd5a000
Address 0x55daffd5a000 belongs to heap block [0x55daffd5a000, 0x55db01d5a000] size=33554432
------------------------- 1 -------------------------
[stack] thread 9 frame 2 rsp+120 @0x7f5cf9a71238: 0x55db00000000
    |--> [heap block] 0x55daffd5a000--0x55db01d5a000 size=33554432

------------------------- 2 -------------------------
[stack] thread 1 frame 0 set @0x7ffe0674ba30: 0x55dafffffffe
    |--> [heap block] 0x55daffd5a000--0x55db01d5a000 size=33554432

------------------------- 3 -------------------------
[stack] thread 1 frame 2 rsp+12064 @0x7ffe0674ea30: 0x55daffffffff
    |--> [heap block] 0x55daffd5a000--0x55db01d5a000 size=33554432

------------------------- 4 -------------------------
[stack] thread 1 frame 2 rsp+14480 @0x7ffe0674f3a0: 0x55daffffffff
    |--> [heap block] 0x55daffd5a000--0x55db01d5a000 size=33554432

------------------------- 5 -------------------------
[stack] thread 1 frame 2 rsp+16256 @0x7ffe0674fa90: 0x55daffffffff
    |--> [heap block] 0x55daffd5a000--0x55db01d5a000 size=33554432

(gdb)
pdvian commented 3 months ago

I am still having trouble with the heap manager on ppc64le arch but managed to get the heap commands working for a coredump from x86_64 arch (tried it on x86_64 arch for verification purpose because we were seeing similar problems compared to ppc64le). The tcmalloc heap manager was failing to initialize because of the missing gperftools-devel package on x86_64. The core analyzer on the ppc64le debug environment does have the correct gperftools-devel ppc64le package installed but failing to lookup for type TCMalloc_PageMap2<35>::Leaf now.

The problem seems to be with coredump (using gcore command) generated on ppc64le arch. That's why tcmalloc heap manager failed to initialize. I will re-open this issue if I am still facing the same issue.