Open ertucci opened 1 year ago
Thanks for the detailed report!
I think there are two key questions in here:
compileunits
data source failing to attribute sample_array
?declfile
data source that follows source files instead of compileunits?Both are good questions.
Looking at the dwarfdump of both executables, it looks like the information I want is present under the DW_AT_decl_file tag.
< 0><0x0000000c> DW_TAG_compile_unit DW_AT_producer (indexed string: 0x00000000)Fuchsia clang version 15.0.0 (https://llvm.googlesource.com/a/llvm-project 3a20597776a5d2920e511d81653b4d2b6ca0c855) DW_AT_language DW_LANG_C_plus_plus_14 DW_AT_name (indexed string: 0x00000001)translation_unit2.cc DW_AT_str_offsets_base 0x00000058 DW_AT_stmt_list 0x000000f0 DW_AT_comp_dir (indexed string: 0x00000002) DW_AT_low_pc (addr_index: 0x00000001)0x00001880 DW_AT_high_pc <offset-from-lowpc> 56 <highpc: 0x000018b8> DW_AT_addr_base 0x00000030 - LOCAL_SYMBOLS: - < 1><0x00000035> DW_TAG_variable DW_AT_name (indexed string: 0x00000005)sample_array DW_AT_type <0x00000040> DW_AT_external yes(1) DW_AT_decl_file 0x00000001 /array.h DW_AT_decl_line 0x00000003 DW_AT_location len 0x0002: 0xa100: DW_OP_addrx 0
Unfortunately this debug entry is missing the two attributes Bloaty generally depends on to attribute this to a section of the binary:
DW_AT_location
: is present, but the provided address is 0, probably due to identical code folding that merged the two copies of this variable into one.DW_AT_linkage_name
: not present, but if it was present it would give us a name we can look up in the symbol table.Without one of these two, Bloaty doesn't know which part of the binary sample_array
corresponds to.
When I look in the symbol table (binary compiled with Clang), I see an address of 0x2010
for the sample_array
symbol:
$ readelf -Ws main | grep sample_array
31: 0000000000002010 40 OBJECT WEAK DEFAULT 17 sample_array
But when I dump the debug info, unfortunately there is no DIE that references this address:
$ readelf --debug-dump=info main | grep 2010
$
Looking at verbose Bloaty output, it looks like the only way Bloaty was able to attribute sample_array
to main.cc
at all was by disassembling the binary:
$ ~/code/bloaty/bloaty -vvv -d compileunits main | grep -A1 compileunits.*\\[2010
[compileunits, x86_disassemble] AddVMRangeForVMAddr(1193, [2010, ffffffffffffffff])
-> translates to: [2010 ffffffffffffffff]
--
[compileunits, x86_disassemble] AddVMRangeForVMAddr(11d3, [2010, ffffffffffffffff])
-> translates to: [2010 ffffffffffffffff]
$
The address was referenced from two different functions, and it was a matter of luck which one Bloaty found first. Looking at the symbol table, the two addresses 0x1193
and 0x11d3
refer to the functions increment_i_1(int*)
and increment_i_2(int*)
:
$ readelf -Ws --demangle main
Symbol table '.symtab' contains 40 entries:
Num: Value Size Type Bind Vis Ndx Name
[...]
33: 0000000000001170 53 FUNC GLOBAL DEFAULT 15 increment_i_1(int*)
34: 0000000000001130 50 FUNC GLOBAL DEFAULT 15 main
35: 00000000000011b0 53 FUNC GLOBAL DEFAULT 15 increment_i_2(int*)
[...]
So why did it reference main.cc
instead of translation_unit1.cc
or translation_unit2.cc
? Looking at the verbose Bloaty output, it looks like this came from dwarf_pcpair
:
$ ~/code/bloaty/bloaty -vvv -d compileunits main | grep compileunits.*1170
[compileunits, dwarf_pcpair] AddVMRange(main.cc, 1170, 35)
[compileunits, dwarf_pcpair] AddVMRange(main.cc, 1170, 35)
[compileunits, dwarf_fde_table] AddFileRangeForVMAddr(1170, [2064, 8])
[compileunits, dwarf_fde] AddFileRangeForVMAddr(1170, [2120, 18])
[compileunits, elf_symtab_name] AddFileRangeForVMAddr(1170, [3c75, 14])
[compileunits, elf_symtab_sym] AddFileRangeForVMAddr(1170, [3a18, 18])
But when I dump the debug info looking for this DW_AT_low_pc=0x1170, DW_AT_high_pc=0x35
, this pcpair only appears for the translation_unit1.cc
compileunit:
<0><86>: Abbrev Number: 1 (DW_TAG_compile_unit)
<87> DW_AT_producer : (indexed string: 0): Debian clang version 14.0.6
<88> DW_AT_language : 33 (C++14)
<8a> DW_AT_name : (indexed string: 0x1): translation_unit1.cc
<8b> DW_AT_str_offsets_base: 0x38
<8f> DW_AT_stmt_list : 0x90
<93> DW_AT_comp_dir : (indexed string: 0x2): /tmp/t
<94> DW_AT_low_pc : (index: 0x1): 0x1170
<95> DW_AT_high_pc : 0x35
<99> DW_AT_addr_base : 0x28
So this looks like a bug in Bloaty. It should have been attributed to translation_unit1.cc
, not main.cc
. It looks like a bug in indexed strings.
declfile
proposalYou asked this question:
Is it possible to create a different source type which is primarily based on the decl file (as this is essentially what I want and compileunits was just as close as I could get)?
Unfortunately the DW_TAG_variable
debugging entry you quoted before doesn't given enough information to attribute this to any specific part of the binary:
DW_AT_name (indexed string: 0x00000005)sample_array DW_AT_type <0x00000040> DW_AT_external yes(1) DW_AT_decl_file 0x00000001 /array.h DW_AT_decl_line 0x00000003 DW_AT_location len 0x0002: 0xa100: DW_OP_addrx 0
It does contain DW_AT_name=sample_array
, and in this case sample_array
happens to also be the linkage name. But this is just a coincidence and cannot be relied on. Suppose we change the program slightly so the function definitions look like this:
void increment_i_2(int* i) {
static int sample_array[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
sample_array[*i]++;
*i += sample_array[*i % 10];
*i += i_2;
}
void increment_i_1(int* i) {
static int sample_array[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
sample_array[*i]++;
*i += sample_array[*i % 10];
*i += i_1;
}
The debug info will still refer to these variables as sample_array
, because that is their name in the code:
<2><72>: Abbrev Number: 3 (DW_TAG_variable)
<73> DW_AT_name : (indexed string: 0x3): sample_array
<74> DW_AT_type : <0x89>
<78> DW_AT_decl_file : 0
<79> DW_AT_decl_line : 7
<7a> DW_AT_location : (DW_OP_addrx <0>)
However the linkage names of these variables are now different, because each function needs its own copy of sample_array
:
$ readelf -Ws main | grep sample_array
13: 0000000000004010 40 OBJECT LOCAL DEFAULT 25 _ZZ13increment_i_1PiE12sample_array
15: 0000000000004040 40 OBJECT LOCAL DEFAULT 25 _ZZ13increment_i_2PiE12sample_array
So unfortunately using DW_TAG_variable
entries to attribute this info to a given declfile doesn't appear to be viable.
I’m trying to use bloaty to divide up memory usage of a project by directory structure. The compileunits source type seemed to be the closest fit to what I was looking for, so I used it as the base type for my custom directory structure data type. However, I noticed that constants inlined within header files weren’t placed into a compileunit named with the path to file, which was the case for most symbols. These were instead rolled up in a generic [section .rodata] compileunit. I’ve tried to reproduce a simple version of this behavior using the source code at the bottom and compiling using gcc and clang.
The symbol, sample_array, should be 40 bytes in size, defined in array.h, and used in translation_unit1.cc and translation_unit2.cc. However, bloaty analysis of the clang executable puts it under both main.cc with strange size attribution because of additional inclusion in the .eh_frame_hdr section.
For gcc, the compileunit name is not intuitive and therefore it is difficult to attribute symbols defined in header inlines to the appropriate source code.
Looking at the dwarfdump of both executables, it looks like the information I want is present under the DW_AT_decl_file tag.
Is it possible to create a different source type which is primarily based on the decl file (as this is essentially what I want and compileunits was just as close as I could get)?
``
main.h
main.cc
array.h
translation_unit1.h
translation_unit1.cc
translation_unit2.h
translation_unit2.cc