koute / bytehound

A memory profiler for Linux.
Other
4.51k stars 193 forks source link

unchanging dat file when applying to official ClickHouse binaries #83

Open hdhoang opened 2 years ago

hdhoang commented 2 years ago

(apologies for sparse details, I'll copy them more as i can formulate them concretely)

we're trying out bytehound to analyze a memleak. However, the dat file always come out at 190MB, and have no update after CH finished starting up. Furthermore, the file has 0s runtime, 0B allocation.

We run on debian11, but CH builds are mostly self-contained. There's a few mode of starting up CH, you can try a one-shot command this way:

  1. obtain a CH as clickhouse-common-static amd64.tgz <= 22.3 from https://packages.clickhouse.com/tgz/stable/ (approx 230MB)
curl -LO https://packages.clickhouse.com/tgz/stable/clickhouse-common-static-22.3.9.19-amd64.tgz
tar xf clickhouse-common-static-22.3.9.19-amd64.tgz
./clickhouse-common-static-22.3.9.19/usr/bin/clickhouse local -q 'select version()'
22.3.9.19

Later CH versions ignore LD_PRELOAD btw, so we'll have to use 22.3 here.

  1. run a 9-second script under bytehound (0.9.0), observe that the file is not updated after first Flushing
env MEMORY_PROFILER_LOG=debug LD_PRELOAD=$HOME/libbytehound.so ./clickhouse-common-static-22.3.9.19/usr/bin/clickhouse local -q 'select toUnixTimestamp(now()); select sleep(3);select sleep(3);select sleep(3); select toUnixTimestamp(now());'
bytehound: 04b2 04b2 INF Version: 0.9.0

debug log:

```text bytehound: 04b2 04b2 INF Version: 0.9.0 bytehound: 04b2 04b2 INF Options: bytehound: 04b2 04b2 INF MEMORY_PROFILER_BASE_SERVER_PORT = 8100 bytehound: 04b2 04b2 INF MEMORY_PROFILER_CHOWN_OUTPUT_TO = None bytehound: 04b2 04b2 INF MEMORY_PROFILER_DISABLE_BY_DEFAULT = false bytehound: 04b2 04b2 INF MEMORY_PROFILER_ENABLE_BROADCAST = false bytehound: 04b2 04b2 INF MEMORY_PROFILER_ENABLE_SERVER = false bytehound: 04b2 04b2 INF MEMORY_PROFILER_GRAB_BACKTRACES_ON_FREE = true bytehound: 04b2 04b2 INF MEMORY_PROFILER_INCLUDE_FILE = None bytehound: 04b2 04b2 INF MEMORY_PROFILER_OUTPUT = memory-profiling_%e_%t_%p.dat bytehound: 04b2 04b2 INF MEMORY_PROFILER_REGISTER_SIGUSR1 = true bytehound: 04b2 04b2 INF MEMORY_PROFILER_REGISTER_SIGUSR2 = true bytehound: 04b2 04b2 INF MEMORY_PROFILER_USE_PERF_EVENT_OPEN = true bytehound: 04b2 04b2 INF MEMORY_PROFILER_USE_SHADOW_STACK = true bytehound: 04b2 04b2 INF MEMORY_PROFILER_WRITE_BINARIES_TO_OUTPUT = true bytehound: 04b2 04b2 INF MEMORY_PROFILER_ZERO_MEMORY = false bytehound: 04b2 04b2 INF MEMORY_PROFILER_GATHER_MMAP_CALLS = false bytehound: 04b2 04b2 INF MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_1 = 16384 bytehound: 04b2 04b2 INF MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_2 = 327680 bytehound: 04b2 04b2 INF MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS = false bytehound: 04b2 04b2 INF MEMORY_PROFILER_TEMPORARY_ALLOCATION_LIFETIME_THRESHOLD = 10000 bytehound: 04b2 04b2 INF MEMORY_PROFILER_TEMPORARY_ALLOCATION_PENDING_THRESHOLD = None bytehound: 04b2 04b2 INF Tracing will be toggled ON (for the first time) bytehound: 04b2 04b2 INF Found 'aligned_alloc' at: 0x0000000019C6F3E0 bytehound: 04b2 04b2 INF Found 'malloc_stats_print' at: 0x0000000019C81CE0 bytehound: 04b2 04b2 INF Found 'malloc_usable_size' at: 0x0000000019C81D80 bytehound: 04b2 04b2 INF Found 'memalign' at: 0x0000000019C72C20 bytehound: 04b2 04b2 INF Found 'dallocx' at: 0x0000000019C7E6C0 bytehound: 04b2 04b2 INF Found 'posix_memalign' at: 0x0000000019C6E5A0 bytehound: 04b2 04b2 INF Found 'free' at: 0x0000000019C72B60 bytehound: 04b2 04b2 INF Found 'mallctlnametomib' at: 0x0000000019C81AA0 bytehound: 04b2 04b2 INF Found 'malloc' at: 0x0000000019C6E4C0 bytehound: 04b2 04b2 INF Found 'nallocx' at: 0x0000000019C81720 bytehound: 04b2 04b2 INF Found 'sdallocx' at: 0x0000000019C815E0 bytehound: 04b2 04b2 INF Found 'mallctl' at: 0x0000000019C81960 bytehound: 04b2 04b2 INF Found 'mallctlbymib' at: 0x0000000019C81B60 bytehound: 04b2 04b2 INF Found 'mallocx' at: 0x0000000019C76220 bytehound: 04b2 04b2 INF Found 'sallocx' at: 0x0000000019C7E260 bytehound: 04b2 04b2 INF Found 'realloc' at: 0x0000000019C74620 bytehound: 04b2 04b2 INF Found 'valloc' at: 0x0000000019C73940 bytehound: 04b2 04b2 INF Found 'calloc' at: 0x0000000019C70180 bytehound: 04b2 04b2 INF Found 'rallocx' at: 0x0000000019C78B40 bytehound: 04b2 04b2 INF Found 'xallocx' at: 0x0000000019C7C340 bytehound: 04b2 04b2 INF Attaching prefixed jemalloc hooks... bytehound: 04b2 04b2 INF Symbol not found: "_rjem_malloc" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_mallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_calloc" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_sdallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_realloc" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_rallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_nallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_xallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_malloc_usable_size" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_mallctl" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_posix_memalign" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_aligned_alloc" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_free" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_sallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_dallocx" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_mallctlnametomib" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_mallctlbymib" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_malloc_stats_print" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_memalign" bytehound: 04b2 04b2 INF Symbol not found: "_rjem_valloc" bytehound: 04b2 04b2 INF Stage 1 initialization finished bytehound: 04b2 04b2 INF Initializing stage 2... bytehound: 04b2 04b2 INF Setting atexit hook... bytehound: 04b2 04b2 INF Registering SIGUSR1 handler... bytehound: 04b2 04b2 INF Registering SIGUSR2 handler... bytehound: 04b2 04b2 INF Stage 2 initialization finished bytehound: 04b2 04b2 DBG Opening perf events; pid=0, cpu=-1, frequency=0, stack_size=0, reg_mask=0x0000000000000000, event_source=SwDummy, inherit=false, start_disabled=false... bytehound: 04b2 04b2 DBG Maximum sample rate: 100000 bytehound: 04b2 04b2 DBG Allocating 16 + 1 pages for the ring buffer for PID 0 on CPU -1 bytehound: 04b2 04b2 DBG Perf events open with fd=5 bytehound: 04b2 04b2 INF Spawning event processing thread... bytehound: 04b2 04b2 INF Tracing was enabled bytehound: 04b2 04b3 INF Starting event thread... bytehound: 04b2 04b3 INF Data ID: 3ce8be8456ba326f23c1fa46e8d647ae bytehound: 04b2 04b3 INF File 'memory-profiling_clickhouse_1660033877_1202.dat' opened for writing bytehound: 04b2 04b3 INF Writing initial header... bytehound: 04b2 04b3 INF Writing wall clock... bytehound: 04b2 04b3 INF Writing uptime... bytehound: 04b2 04b3 INF Writing environ... bytehound: 04b2 04b3 INF Writing maps... bytehound: 04b2 04b3 INF Writing binaries... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/ld-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/librt-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libdl-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libm-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/home/osboxes/tgz/clickhouse-common-static-22.3.9.19/usr/bin/clickhouse'... 1660033877 bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libgcc_s.so.1'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libc-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libpthread-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/usr/lib/x86_64-linux-gnu/libresolv-2.31.so'... bytehound: 04b2 04b3 DBG Writing '/home/osboxes/libbytehound.so'... bytehound: 04b2 04b3 INF Flushing... ```

final sql & bytehound output:

bytehound: 04b2 04b3 INF Flushing...
0
0
0
1660033886
bytehound: 04b2 04b2 INF Exit hook called
bytehound: 04b2 04b3 INF Flushing 0 bucket(s) on exit
bytehound: 04b2 04b3 INF Flushing 0 bucket(s) on exit
bytehound: 04b2 04b3 INF Event thread finished
bytehound: 04b2 04b3 DBG Disabling thread 04b6...
bytehound: 04b2 04b3 DBG Disabling thread 04b7...
bytehound: 04b2 04b3 DBG Disabling thread 04b2...
bytehound: 04b2 04b3 DBG Disabling thread 04b4...
bytehound: 04b2 04b3 INF Tracing was disabled
bytehound: 04b2 04b3 DBG Thread dropped: 04B3
bytehound: 04b2 04b2 INF Exit hook finished
bytehound: 04b2 04b6 DBG Thread dropped: 04B6
bytehound: 04b2 04b4 DBG Thread dropped: 04B4
bytehound: 04b2 04b7 DBG Thread dropped: 04B7

dat file is not updated beyond 2 seconds

stat -c '%Y %s' memory-profiling_clickhouse_1660033877_1202.dat
1660033879 198359119

and it contains no info image

we would love to use bytehound's USR1 signaling to catch the relevant, long-term memleak. do you have any suggestion to gather more info?

we also tried LD_PRELOAD both bytehound.so & system libjemalloc.so.2 at the same time, but it segfaults/aborts (details TBD). further more, in interactive mode (eg clickhouse local without -q), it aborts right away too.

thanks for the tooling!

koute commented 2 years ago

Looks like they're using their own malloc implementation:

$ readelf -aW clickhouse | rg "Symbol table|\b(malloc|realloc|free)\b"
Symbol table '.dynsym' contains 487796 entries:
173447: 0000000019a72b60   166 FUNC    GLOBAL DEFAULT   15 free
235465: 0000000019a6e4c0   194 FUNC    GLOBAL DEFAULT   15 malloc
434378: 0000000019a74620  7160 FUNC    GLOBAL DEFAULT   15 realloc

So it's expected that it doesn't work. This kind of binaries are currently only supported for Rust programs using jemalloc.

If you'd recompile CH and disable this then profiling should work.

hdhoang commented 2 years ago

much thanks for the prompt response!

I will try rebuilding a CH with linkable malloc. so far their -D ENABLE_TCMALLOC=0 advice gives thread '<unnamed>' panicked at 'not yet implemented: _rjem_sallocx', preload/src/api.rs:743:5 under bytehound.

we'll try their guide next, and also try rebuilding without both TCMALLOC & their jemalloc

koute commented 2 years ago

I will try rebuilding a CH with linkable malloc. so far their -D ENABLE_TCMALLOC=0 advice gives thread '<unnamed>' panicked at 'not yet implemented: _rjem_sallocx', preload/src/api.rs:743:5 under bytehound.

Does this build use jemalloc? Because it looks like it might. It looks like it's calling sallocx, but I haven't implemented that one in Bytehound since Rust's jemallocator crate doesn't use it. It seems that you'll have to recompile CH so that it uses the system allocator instead. (Or the sallocx could be implemented inside of Bytehound, but that's more effort.)

hdhoang commented 2 years ago

i guess so. here's a build with ENABLE_TCMALLOC=0 ENABLE_JEMALLOC=0, it's a new error

bytehound: c7986 c7987 INF Writing initial header...
bytehound: c7986 c7987 INF Writing wall clock...
bytehound: c7986 c7987 INF Writing uptime...
bytehound: c7986 c7986 WRN Mapping 0x0000000000200000-0x0000000000201000 from '/root/ch-build/clickhouse' doesn't match any PT_LOAD entry
bytehound: c7986 c7986 WRN Duplicate PT_LOAD matches for a single memory region: Region { start: 406020096, end: 407764992, is_read: true, is_write: true, is_executable: false, is_shared: false, file_offset: 403910656, major: 9, minor: 1, inode: 167515591, name: "/root/ch-build/clickhouse" }
bytehound: c7986 c7987 INF Writing environ...
bytehound: c7986 c7986 WRN   Match #0: LoadHeader { address: 405954816, file_offset: 403849472, file_size: 61632, memory_size: 61680, alignment: 4096, is_readable: true, is_writable: true, is_executable: false } => (AddressMapping { declared_address: 406016000, actual_address: 406020096, file_offset: 403910656, size: 1744896 }, LoadHeader { address: 405954816, file_offset: 403849472, file_size: 61632, memory_size: 61680, alignment: 4096, is_readable: true, is_writable: true, is_executable: false })
bytehound: c7986 c7986 WRN   Match #1: LoadHeader { address: 406020608, file_offset: 403911168, file_size: 1743304, memory_size: 3882184, alignment: 4096, is_readable: true, is_writable: true, is_executable: false } => AddressMapping { declared_address: 406020096, actual_address: 406020096, file_offset: 403910656, size: 1744896 }
bytehound: c7986 c7987 INF Writing maps...
bytehound: c7986 c7987 INF Writing binaries...
bytehound: c7986 c7987 DBG Writing '/usr/lib/x86_64-linux-gnu/libresolv-2.31.so'...
bytehound: c7986 c7987 DBG Writing '/usr/lib/x86_64-linux-gnu/libc-2.31.so'...
fish: “env LD_PRELOAD=libbytehound.so…” terminated by signal SIGSEGV (Address boundary error)

here's how we build it in CH's builder image (i'll link their page here later):

  1. clone CH repo at tag v22.3.10.22-lts, including correct submodules
  2. install ccache to save our time
  3. use specific builder image with clang-13
cd docker/packager/
# change defines here
export CMAKE_FLAGS="-DENABLE_JEMALLOC=0 -DENABLE_TCMALLOC=0"
#
./packager --cache ccache --output-dir /root/ch-build --package-type binary --compiler clang-13 --docker-image-version 39450-amd64

output dir will contain the big clickhouse executable. we forgot to use --build-type debug for this one though.

koute commented 2 years ago

Okay, that's interesting. It is crashing, which definitely shouldn't happen.

This line might be related:

bytehound: c7986 c7986 WRN Mapping 0x0000000000200000-0x0000000000201000 from '/root/ch-build/clickhouse' doesn't match any PT_LOAD entry
bytehound: c7986 c7986 WRN Duplicate PT_LOAD matches for a single memory region: Region { start: 406020096, end: 407764992, is_read: true, is_write: true, is_executable: false, is_shared: false, file_offset: 403910656, major: 9, minor: 1, inode: 167515591, name: "/root/ch-build/clickhouse" }

It might be because it's all done under docker (I do somewhat support running under docker, but probably not every corner case), or it's because CH might have used a weird linker and/or linker args when linking itself.

You could maybe try to run it with MEMORY_PROFILER_USE_SHADOW_STACK=0 and then once it crashes load the coredump with gdb and see where exactly it SIGSEGVs.

hdhoang commented 2 years ago

under docker

we only build the binary in docker (base is ubuntu 20.04), to mimick CH's packaging method. we run the resulting file on-host (debian11). i too like to avoid any complication from docker infrastructure, if possible.

here i have a backtrace (under gdb shows the same, shadow-stack=on/off is the same). it's something to do with avx2 memset when defining(?) error codes. http://ix.io/47gW

it doesn't look bytehound-related somehow.

under gdb

I tried recording with rr too. under it, mmap perf_event_open fails, so i don't think you're interested in an exported record.

sallocx could be implemented inside of Bytehound

would this be similar in shape to xallocx (checking input, then call real jem_sallocx)? how can we attempt to add it?

thanks in advance!

koute commented 2 years ago

would this be similar in shape to xallocx (checking input, then call real jem_sallocx)? how can we attempt to add it?

Yes, call the real function and fixup the size (the allocations are bigger by default since they also contain a tag which Bytehound uses to track the allocations, but we can't let the original app know about this extra memory as it then might try to overwrite it)

here i have a backtrace (under gdb shows the same, shadow-stack=on/off is the same). it's something to do with avx2 memset when defining(?) error codes. http://ix.io/47gW

Assuming the backtrace is correct, it's possible it's related. If you look at the lower frames you can see that this crash happens before main runs, which might mean that something weird is happening during static initialization.

I tried recording with rr too. under it, mmap perf_event_open fails, so i don't think you're interested in an exported record.

perf_event_open is not necessary for correctness; it's just an optimization.

pamarcos commented 2 years ago

I'm also trying to test bytehound with ClickHouse and found exactly the same issues presented here. @hdhoang did you finally got it working?

hdhoang commented 2 years ago

sorry, we have no further progress. the CH people fixed our issue with jemalloc diff.

I tried adding sallocx in a PR, but don't want to pursue that further until the bytehound refactor happens (there's a mention in another issue).