llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.4k stars 12.15k forks source link

llvm-objcopy generates unhelpful PE/COFF debug data for gnu-debuglink #44622

Open vit9696 opened 4 years ago

vit9696 commented 4 years ago
Bugzilla Link 45277
Version trunk
OS All
CC @alexshap,@dwblaikie,@jh7370,@rupprecht

Extended Description

After stripping DWARF debug information one can use --add-gnu-debuglink to link the resulting file with the original file containing the debug information:

$ cp file.dll file.debug $ llvm-objcopy --strip-unneeded file.dll $ llvm-objcopy --add-gnu-debuglink=$(pwd)/file.debug file.dll

When working with PE/COFF files this functionality is unfortunately very limited: https://github.com/llvm/llvm-project/blob/f69eba07726a9fe084812aa224309d62c4bdd2e4/llvm/tools/llvm-objcopy/COFF/COFFObjcopy.cpp#L84-L90

  1. Unlike ELF, which gets a section with .gnu_debuglink name, PE/COFF files get anonymous sections like /1234. This makes it very hard to reliably locate the section, as it does not even have a magic, just a filename and a CRC32 hash.

.gnu_debuglink is 14 bytes long, while PE/COFF has 8 bytes maximum for the section name. However, even if we cannot use this name, any other unique value will work just fine (e.g. .dbglink or .debug).

  1. .gnu_debuglink section is not an expected way to link PE with the debug information. In general for PE/COFF a CodeView entry in debug DataDirectory is used in either PDB 2.0 (NB10) or PDB 7.0 (RSDS) format.

For DWARF this is not added by llvm-objcopy, but it is likely desired, as several tools for e.g. UEFI firmware debugging rely on at least some kind of CodeView entry to be present.

In MinGW mode LLD already generates a dummy PDB 7.0 entry: https://github.com/llvm/llvm-project/blob/8620bb9534342176ac739e2a587e4cecf437310c/lld/COFF/Writer.cpp#L1823-L1831

For non-LLVM projects, such as EDK II GenFw utility used for building UEFI firmware PE files from ELFs, it is common to add a PDB 2.0 (NB10) entry: https://github.com/tianocore/edk2/blob/b219e2c/MdePkg/Include/IndustryStandard/PeImage.h#L614

Perhaps, this can be adopted in llvm-objcopy as well.

  1. Tools like GenFw also embed full file path instead of just the base name, which makes it much easier to locate the file on the host during the debugging session.

I believe GNU objcopy for ELF also strips the path, but for convenience reasons we can make an option to keep it. E.g. --add-gnu-debuglink=/path/to/filename,/path/to/embedded/filename.

vit9696 commented 4 years ago

PE files with symtab stripped and present After looking at the code I discovered that the section is not actually anonymous, as /4 is just a way to specify an offset in the symbol table. However, different tools may actually strip this information upon deployment, which will result in an unnamed section and in llvm-readobj being unable to parse such file with an "Invalid data was encountered while parsing the file" error.

Even if we assume the tools are not working correctly, I believe llvm-readobj should still work fine with such files, and CodeView section still needs to be updated.

I attached a couple of sample files for reference:

  1. DxeIpl_targnu.efi has symtab removed and causes an error
  2. PcdPeim_tar_gnu.efi has symtab present, but for some reason the section does not shows in llvm-readobj -a.
vit9696 commented 4 years ago

James, I checked GNU objcopy, and it matches llvm-objcopy behaviour.

  1. An unnamed section is added to the end.
  2. Debug directory entry is not updated (and is empty).
  3. File path is indeed stripped.

I guess, the same request could be filed for GNU objcopy as well, but we do not have a particular need in it as we do not try to generate PE files directly with GNU toolchain.

jh7370 commented 4 years ago

I'm not sufficiently knowledgable about COFF to be able to look at this in any detail, but I did have one question about point 1:

What does GNU objcopy do about the section name for .gnu_debuglink?

alvinhochun commented 2 years ago

Some points I noted from adding COFF gnu-debuglink support to LLDB:

Personally, gnu-debuglink has been working in a satisfactory manner for Krita on Windows (mingw-w64). I don't know the EFI target, but if some external tools are mangling the files in a way which makes the .gnu_debuglink section unreadable, can you instead use the build ID to look up the matching debug file?