Open sfc-gh-sgiesecke opened 1 year ago
@llvm/issue-subscribers-debuginfo
@llvm/issue-subscribers-tools-llvm-symbolizer
DWARF isn't a format that's trivial/easy to do single pass over to do things like symbolizing (for instance, you need to find the entity that contains the address being symbolized (or multiple addresses, if you're symbolizing a whole stack trace) - and then you might need to follow inlined subroutine descriptions to other DIEs - possibly forward or backwards in the DWARF) - and compressing the whole section doesn't make it easy to visit/decompress small parts of it.
Basically the format isn't especially amenable to anything short of decompressing the whole thing and walking around it.
It's not impossible to do something else, though - you could stream decompress and create a separate data structure as you walk, recording relevant info like the names/DIE offsets of any function (potentially abstract) definition DIEs and then symbolize based on that intermediate data structure - it's a lot of work to implement that, though.
Does gnu binutils addr2line (or any other common symbolizer) do significantly better here?
It's not impossible to do something else, though - you could stream decompress and create a separate data structure as you walk, recording relevant info like the names/DIE offsets of any function (potentially abstract) definition DIEs and then symbolize based on that intermediate data structure - it's a lot of work to implement that, though.
I see, thanks for the explanation! I wonder if using split-dwarf would be a better alternative then. It requires more changes in tooling than just enabling compression though.
Does gnu binutils addr2line (or any other common symbolizer) do significantly better here?
Depends on how you picture that ;) The memory usage is indeed somewhat lower for the compressed one (2594572 addr2line vs. 3940772 llvm-symbolizer), but OTOH for the uncompressed one it's not much lower than for the compressed one. (Actually, we're more involved with llvm-gsymutil --convert
though, I just provided the numbers for llvm-symbolizer
since that's probably more widely used.)
I don't have a very recent binutils addr2line version at hand, unfortunately. If it makes sense, I can try to build one and see where we're at with that.
$ scl enable devtoolset-10 -- addr2line -v
GNU addr2line version 2.35-5.el7.4
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
$ /usr/bin/time scl enable devtoolset-10 -- addr2line -e binary.compressed-dwarf 0x$(nm binary.compressed-dwarf | grep _ZN4llvm3orc8JITDylib5clearEv$ | cut -c 1-16)
.../llvm/src/lib/ExecutionEngine/Orc/Core.cpp:618
7.03user 1.91system 0:08.97elapsed 99%CPU (0avgtext+0avgdata 2594572maxresident)k
9272inputs+8outputs (17major+882795minor)pagefaults 0swaps
$ /usr/bin/time scl enable devtoolset-10 -- addr2line -e binary.uncompressed-dwarf 0x$(nm binary.uncompressed-dwarf | grep _ZN4llvm3orc8JITDylib5clearEv$ | cut -c 1-16)
.../llvm/src/lib/ExecutionEngine/Orc/Core.cpp:618
0.16user 2.04system 0:02.34elapsed 94%CPU (0avgtext+0avgdata 2572364maxresident)k
2192104inputs+8outputs (0major+667961minor)pagefaults 0swaps
Split DWARF is sort of orthogonal to compression - you can use both/either/neither. A compressed dwp is probably no better than compressed non-split linked debug info - in either case you're basically decompressing the whole thing and keeping that in memory.
Split DWARF on .dwo files, not dwp, with compression could be lower memory usage (since we'd only need to decompress a select few .dwo files) But then you've got a lot more duplication/more total file usage across all the dwo files.
I built a large binary (which among others links against several LLVM libraries, I randomly picked one LLVM symbol for the test) with and without
-Wl,--compress-debug-sections=zlib
(using LLD fromllvm-16.0.0
):WIth
--compress-debug-sections
,llvm-symbolizer
Point 1. might be expected/unavoidable due to the decompression overhead, but 2. seems unexpected/avoidable in this extent.
A similar behaviour exists with
llvm-gsymutil --convert
, which isn't surprising as this probably originates in the DebugInfoDWARF library used by both.Just for completeness:
llvm-symbolizer
shows a number of warnings likein both cases. I guess that's not relevant here though.