llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.4k stars 12.15k forks source link

llvm-dwarfdump does not correctly dump .debug_rnglists #118161

Open oltolm opened 2 days ago

oltolm commented 2 days ago

I created an exe with DWARF5 with Clang and dumped it with llvm-dwarfdump

llvm-dwarfdump.exe --debug-info --debug-rnglists bin\test_mgwhelp.exe

This is the incorrect result:

.debug_rnglists contents:
range list header: length = 0x0000001b, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001
offsets: [
0x00000004
]
ranges:
[0x0000000000000031, 0x00000000000002a5)
[0x00000000000002b1, 0x0000000000000c58)
[0x0000000000000000, 0x000000000000005c)
<End of list>

The correct output is produced inline in .debug_info output:

              DW_AT_ranges [DW_FORM_rnglistx]   (indexed (0x0) rangelist = 0x00000010
                 [0x0000000140001390, 0x0000000140001604)
                 [0x0000000140001610, 0x0000000140001fb7)
                 [0x0000000140001fc0, 0x000000014000201c))

the correct raw output is produced when passing --verbose

.debug_rnglists contents:
0x00000000: range list header: length = 0x0000001b, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001
offsets: [
0x00000004 => 0x00000010
]
ranges:
0x00000010: [DW_RLE_base_addressx]:  0x0000000000000031
0x00000012: [DW_RLE_offset_pair  ]:  0x0000000000000000, 0x0000000000000274 => [0x0000000000000031, 0x00000000000002a5)
0x00000016: [DW_RLE_offset_pair  ]:  0x0000000000000280, 0x0000000000000c27 => [0x00000000000002b1, 0x0000000000000c58)
0x0000001b: [DW_RLE_startx_length]:  0x0000000000000032, 0x000000000000005c => [0x0000000000000000, 0x000000000000005c)
0x0000001e: [DW_RLE_end_of_list  ]
bin\test_mgwhelp.exe:   file format COFF-x86-64

The wrong output is neither the raw output nor the one from .debug_info. I could contribute a fix that always outputs the raw contents of .debug_rnglists. What do you think?

llvmbot commented 2 days ago

@llvm/issue-subscribers-tools-llvm-dwarfdump

Author: oltolm (oltolm)

I created an exe with DWARF5 with Clang and dumped it with `llvm-dwarfdump` ``` llvm-dwarfdump.exe --debug-info --debug-rnglists bin\test_mgwhelp.exe ``` This is the incorrect result: ``` .debug_rnglists contents: range list header: length = 0x0000001b, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001 offsets: [ 0x00000004 ] ranges: [0x0000000000000031, 0x00000000000002a5) [0x00000000000002b1, 0x0000000000000c58) [0x0000000000000000, 0x000000000000005c) <End of list> ``` The correct output is produced inline in `.debug_info` output: ``` DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x0) rangelist = 0x00000010 [0x0000000140001390, 0x0000000140001604) [0x0000000140001610, 0x0000000140001fb7) [0x0000000140001fc0, 0x000000014000201c)) ``` the correct raw output is produced when passing `--verbose` ``` .debug_rnglists contents: 0x00000000: range list header: length = 0x0000001b, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001 offsets: [ 0x00000004 => 0x00000010 ] ranges: 0x00000010: [DW_RLE_base_addressx]: 0x0000000000000031 0x00000012: [DW_RLE_offset_pair ]: 0x0000000000000000, 0x0000000000000274 => [0x0000000000000031, 0x00000000000002a5) 0x00000016: [DW_RLE_offset_pair ]: 0x0000000000000280, 0x0000000000000c27 => [0x00000000000002b1, 0x0000000000000c58) 0x0000001b: [DW_RLE_startx_length]: 0x0000000000000032, 0x000000000000005c => [0x0000000000000000, 0x000000000000005c) 0x0000001e: [DW_RLE_end_of_list ] bin\test_mgwhelp.exe: file format COFF-x86-64 ``` The wrong output is neither the raw output nor the one from `.debug_info`. I could contribute a fix that always outputs the raw contents of `.debug_rnglists`. What do you think?
oltolm commented 1 day ago

A similar issue was reported in #56342.

dwblaikie commented 4 hours ago

Not sure I follow - the raw verbose output is just as wrong as the non-verbose output - the result after the => is the computed result, which lacks the address because the adresss pool isn't accessible.

It's the nature of DWARF - we can't print .debug_rnglists correctly when examined in isolation, because we don't know which address pool (.debug_addr) to use, without parsing all the .debug_info contributions (at least their first DIEs).

The way llvm-dwarfdump handles this is that if you dump .debug_info as well, it'll record that and use it for dumping other sections - if you don't, we don't, and print things out as though every address in the address pool was 0 (this looks roughly like what happens when you dump an intermediate object file, since the addreses aren't resolved at that point)

eg:

$ clang++-tot test.cpp -g -ffunction-sections && llvm-dwarfdump-tot a.out --debug-rnglists -v -debug-info
a.out:  file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000053, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000057)

0x0000000c: DW_TAG_compile_unit [1] *
              DW_AT_producer [DW_FORM_strx1]    (indexed (00000000) string = "clang version 20.0.0git (git@github.com:llvm/llvm-project.git 8dd9f206b518a97132f3e2489ccc93704e638353)")
              DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus_14)
              DW_AT_name [DW_FORM_strx1]        (indexed (00000001) string = "test.cpp")
              DW_AT_str_offsets_base [DW_FORM_sec_offset]       (0x00000008)
              DW_AT_stmt_list [DW_FORM_sec_offset]      (0x00000000)
              DW_AT_comp_dir [DW_FORM_strx1]    (indexed (00000002) string = "/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_low_pc [DW_FORM_addr]       (0x0000000000000000)
              DW_AT_ranges [DW_FORM_rnglistx]   (indexed (0x0) rangelist = 0x00000010
                 [0x0000000000001130, 0x0000000000001136)
                 [0x0000000000001140, 0x0000000000001146)
                 [0x0000000000001150, 0x0000000000001158))
              DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)
              DW_AT_rnglists_base [DW_FORM_sec_offset]  (0x0000000c)
...
.debug_rnglists contents:
0x00000000: range list header: length = 0x00000016, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001
offsets: [
0x00000004 => 0x00000010
]
ranges:
0x00000010: [DW_RLE_startx_length]:  0x0000000000000000, 0x0000000000000006 => [0x0000000000001130, 0x0000000000001136)
0x00000013: [DW_RLE_startx_length]:  0x0000000000000001, 0x0000000000000006 => [0x0000000000001140, 0x0000000000001146)
0x00000016: [DW_RLE_startx_length]:  0x0000000000000002, 0x0000000000000008 => [0x0000000000001150, 0x0000000000001158)
0x00000019: [DW_RLE_end_of_list  ]
h$ clang++-tot test.cpp -g -ffunction-sections && llvm-dwarfdump-tot a.out --debug-rnglists -v
a.out:  file format elf64-x86-64

.debug_rnglists contents:
0x00000000: range list header: length = 0x00000016, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001
offsets: [
0x00000004 => 0x00000010
]
ranges:
0x00000010: [DW_RLE_startx_length]:  0x0000000000000000, 0x0000000000000006 => [0x0000000000000000, 0x0000000000000006)
0x00000013: [DW_RLE_startx_length]:  0x0000000000000001, 0x0000000000000006 => [0x0000000000000000, 0x0000000000000006)
0x00000016: [DW_RLE_startx_length]:  0x0000000000000002, 0x0000000000000008 => [0x0000000000000000, 0x0000000000000008)
0x00000019: [DW_RLE_end_of_list  ]

I'd be open to somee documentation patch, perhaps some part of the output could clarify that it's not accurate/the address index is unresolved. (could print out the address ranges as addrx[0]+0x0, addrx[0]+0x6, etc... but that might be a bit tedious/repetitive)

oltolm commented 2 hours ago

I don't understand why you call the raw output wrong. At least I can use it to manually calculate the result.

I think your example only works by accident. There are multiple problems here:

case dwarf::DW_RLE_base_addressx: {
    if (auto SA = LookupPooledAddress(Value0))
      CurrentBase = SA->Address;
    else
      CurrentBase = Value0;
    if (!DumpOpts.Verbose)
      return;
    DWARFFormValue::dumpAddress(OS << ' ', AddrSize, Value0);
    break;
  }

If it can not lookup the address it uses the offset into .debug_addr as address base. I think this is wrong, but more importantly LookupPooledAddress looks up the address in the first CU:

  auto LookupPooledAddress =
      [&](uint32_t Index) -> std::optional<SectionedAddress> {
    const auto &CUs = compile_units();
    auto I = CUs.begin();
    if (I == CUs.end())
      return std::nullopt;
    return (*I)->getAddrOffsetSectionItem(Index);
  };

but there is no guarantee that it belongs to the first CU. In my example the first CU does not have an address base, but even if it did, it could be the wrong CU.

Another problem

  case dwarf::DW_RLE_offset_pair:
    PrintRawEntry(OS, *this, AddrSize, DumpOpts);
    if (CurrentBase != Tombstone)
      DWARFAddressRange(Value0 + CurrentBase, Value1 + CurrentBase)
          .dump(OS, AddrSize, DumpOpts);
    else
      OS << "dead code";
    break;

If DW_RLE_offset_pair is the first entry then there is no base address and it can not calculate the result even if you pass --debug-info, like in this example:

.debug_rnglists contents:
0x00000000: range list header: length = 0x0000002c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000000
ranges:
0x0000000c: [DW_RLE_offset_pair]:  0x0000000000000014, 0x000000000000005a => [0x0000000000000014, 0x000000000000005a)
0x0000000f: [DW_RLE_offset_pair]:  0x00000000000000c0, 0x00000000000000f8 => [0x00000000000000c0, 0x00000000000000f8)
0x00000014: [DW_RLE_offset_pair]:  0x0000000000000110, 0x000000000000012e => [0x0000000000000110, 0x000000000000012e)
0x00000019: [DW_RLE_end_of_list]