llvm-symbolizer uses significantly more memory with zlib compressed DWARF

sfc-gh-sgiesecke commented 1 year ago

I built a large binary (which among others links against several LLVM libraries, I randomly picked one LLVM symbol for the test) with and without -Wl,--compress-debug-sections=zlib (using LLD from llvm-16.0.0):

$ /usr/bin/time llvm-symbolizer -e binary.compressed-dwarf 0x$(nm binary.compressed-dwarf | grep _ZN4llvm3orc8JITDylib5clearEv$ | cut -c 1-16)
llvm::orc::JITDylib::clear()
.../llvm/src/lib/ExecutionEngine/Orc/Core.cpp:618:25

8.55user 0.63system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 3940772maxresident)k
0inputs+0outputs (0major+767100minor)pagefaults 0swaps
$ /usr/bin/time llvm-symbolizer -e binary.uncompressed-dwarf 0x$(nm binary.uncompressed-dwarf | grep _ZN4llvm3orc8JITDylib5clearEv$ | cut -c 1-16)
llvm::orc::JITDylib::clear()
.../llvm/src/lib/ExecutionEngine/Orc/Core.cpp:618:25

0.63user 0.22system 0:01.14elapsed 74%CPU (0avgtext+0avgdata 450928maxresident)k
2129840inputs+0outputs (417major+89119minor)pagefaults 0swaps

WIth --compress-debug-sections, llvm-symbolizer

takes significantly longer
uses significantly more memory (3940772maxresident vs. 450928maxresident)

Point 1. might be expected/unavoidable due to the decompression overhead, but 2. seems unexpected/avoidable in this extent.

A similar behaviour exists with llvm-gsymutil --convert, which isn't surprising as this probably originates in the DebugInfoDWARF library used by both.

Just for completeness: llvm-symbolizer shows a number of warnings like

warning: address range table at offset 0x9066e0 has a premature terminator entry at offset 0x906970

in both cases. I guess that's not relevant here though.

llvmbot commented 1 year ago

@llvm/issue-subscribers-debuginfo

llvmbot commented 1 year ago

@llvm/issue-subscribers-tools-llvm-symbolizer

dwblaikie commented 1 year ago

DWARF isn't a format that's trivial/easy to do single pass over to do things like symbolizing (for instance, you need to find the entity that contains the address being symbolized (or multiple addresses, if you're symbolizing a whole stack trace) - and then you might need to follow inlined subroutine descriptions to other DIEs - possibly forward or backwards in the DWARF) - and compressing the whole section doesn't make it easy to visit/decompress small parts of it.

Basically the format isn't especially amenable to anything short of decompressing the whole thing and walking around it.

It's not impossible to do something else, though - you could stream decompress and create a separate data structure as you walk, recording relevant info like the names/DIE offsets of any function (potentially abstract) definition DIEs and then symbolize based on that intermediate data structure - it's a lot of work to implement that, though.

Does gnu binutils addr2line (or any other common symbolizer) do significantly better here?

sfc-gh-sgiesecke commented 1 year ago

It's not impossible to do something else, though - you could stream decompress and create a separate data structure as you walk, recording relevant info like the names/DIE offsets of any function (potentially abstract) definition DIEs and then symbolize based on that intermediate data structure - it's a lot of work to implement that, though.

I see, thanks for the explanation! I wonder if using split-dwarf would be a better alternative then. It requires more changes in tooling than just enabling compression though.

Does gnu binutils addr2line (or any other common symbolizer) do significantly better here?

Depends on how you picture that ;) The memory usage is indeed somewhat lower for the compressed one (2594572 addr2line vs. 3940772 llvm-symbolizer), but OTOH for the uncompressed one it's not much lower than for the compressed one. (Actually, we're more involved with llvm-gsymutil --convert though, I just provided the numbers for llvm-symbolizer since that's probably more widely used.)

I don't have a very recent binutils addr2line version at hand, unfortunately. If it makes sense, I can try to build one and see where we're at with that.

$ scl enable devtoolset-10 -- addr2line -v
GNU addr2line version 2.35-5.el7.4
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
$ /usr/bin/time scl enable devtoolset-10 -- addr2line -e binary.compressed-dwarf 0x$(nm binary.compressed-dwarf | grep _ZN4llvm3orc8JITDylib5clearEv$ | cut -c 1-16)
.../llvm/src/lib/ExecutionEngine/Orc/Core.cpp:618
7.03user 1.91system 0:08.97elapsed 99%CPU (0avgtext+0avgdata 2594572maxresident)k
9272inputs+8outputs (17major+882795minor)pagefaults 0swaps
$ /usr/bin/time scl enable devtoolset-10 -- addr2line -e binary.uncompressed-dwarf 0x$(nm binary.uncompressed-dwarf | grep _ZN4llvm3orc8JITDylib5clearEv$ | cut -c 1-16)
.../llvm/src/lib/ExecutionEngine/Orc/Core.cpp:618
0.16user 2.04system 0:02.34elapsed 94%CPU (0avgtext+0avgdata 2572364maxresident)k
2192104inputs+8outputs (0major+667961minor)pagefaults 0swaps

dwblaikie commented 1 year ago

Split DWARF is sort of orthogonal to compression - you can use both/either/neither. A compressed dwp is probably no better than compressed non-split linked debug info - in either case you're basically decompressing the whole thing and keeping that in memory.

Split DWARF on .dwo files, not dwp, with compression could be lower memory usage (since we'd only need to decompress a select few .dwo files) But then you've got a lot more duplication/more total file usage across all the dwo files.

llvm / llvm-project

llvm-symbolizer uses significantly more memory with zlib compressed DWARF #63290