llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.52k stars 11.79k forks source link

llvm-objcopy does not seem to handle `-O binary` correctly, adds PE header to the output #108946

Open mgorny opened 4 weeks ago

mgorny commented 4 weeks ago

In Gentoo, we're using objcopy to extract the Linux kernel image from UKI image.

Reproducer:

wget https://distfiles.gentoo.org/distfiles/1d/gentoo-kernel-6.10.9-1.amd64.gpkg.tar
tar -xf gentoo-kernel-6.10.9-1.amd64.gpkg.tar gentoo-kernel-6.10.9-1/image.tar.xz
tar -xf gentoo-kernel-6.10.9-1/image.tar.xz image/usr/src/linux-6.10.9-gentoo-dist/arch/x86/boot/uki.efi
objcopy -O binary -j.linux image/usr/src/linux-6.10.9-gentoo-dist/arch/x86/boot/uki.efi bzImage

Comparing the files created by GNU objcopy and LLVM objcopy:

-rwxr-xr-x 1 mgorny mgorny  19606512 09-17 11:15 bzImage.gnu
-rwxr-xr-x 1 mgorny mgorny  19607040 09-17 11:15 bzImage.llvm

The LLVM file has additional 512 bytes at the front:

00000000  4d 5a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |MZ..............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000030  00 00 00 00 00 00 00 00  00 00 00 00 40 00 00 00  |............@...|
00000040  50 45 00 00 64 86 01 00  de 9a d0 66 00 00 00 00  |PE..d......f....|
00000050  00 00 00 00 f0 00 2e 02  0b 02 00 00 ae 87 00 00  |................|
00000060  00 00 00 00 00 00 00 00  e0 96 00 00 00 10 00 00  |................|
00000070  00 00 f9 4d 01 00 00 00  00 10 00 00 00 02 00 00  |...M............|
00000080  00 00 00 00 00 01 05 00  01 00 01 00 00 00 00 00  |................|
00000090  00 c0 dd 05 00 02 00 00  00 00 00 00 0a 00 60 01  |..............`.|
000000a0  00 00 10 00 00 00 00 00  00 10 00 00 00 00 00 00  |................|
*
000000c0  00 00 00 00 10 00 00 00  00 00 00 00 00 00 00 00  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 00 00 00  00 28 dd 05 f0 09 00 00  |.........(......|
000000f0  00 10 01 00 84 00 00 00  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000140  00 00 00 00 00 00 00 00  2e 6c 69 6e 75 78 00 00  |.........linux..|
00000150  f0 2b 2b 01 00 90 b2 04  00 2c 2b 01 00 02 00 00  |.++......,+.....|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 20 00 00 40  |............ ..@|
00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

And 16 bytes (of padding?) at the end:

012b2df0  cc cc cc cc cc cc cc cc  cc cc cc cc cc cc cc cc  |................|

In fact, if I run GNU objcopy without -O binary, I get roughly the same format as LLVM gives. This leads me to conclude that LLVM objcopy does not implement -O binary correctly, and instead uses PE output, same as the original file.

$ file bzImage.*
bzImage.gnu:                  Linux kernel x86 boot executable bzImage, version 6.10.9-gentoo-dist (root@devbox) #1 SMP PREEMPT_DYNAMIC Sun Sep  8 11:45:05 -00 2024, RO-rootFS, swap_dev 0X12, Normal VGA
bzImage.gnu-without-O-binary: PE32+ executable (EFI application) x86-64 (stripped to external PDB), for MS Windows
bzImage.llvm:                 PE32+ executable (EFI application) x86-64 (stripped to external PDB), for MS Windows
llvmbot commented 4 weeks ago

@llvm/issue-subscribers-tools-llvm-objcopy-strip

Author: Michał Górny (mgorny)

In Gentoo, we're using `objcopy` to extract the Linux kernel image from UKI image. Reproducer: ``` wget https://distfiles.gentoo.org/distfiles/1d/gentoo-kernel-6.10.9-1.amd64.gpkg.tar tar -xf gentoo-kernel-6.10.9-1.amd64.gpkg.tar gentoo-kernel-6.10.9-1/image.tar.xz tar -xf gentoo-kernel-6.10.9-1/image.tar.xz image/usr/src/linux-6.10.9-gentoo-dist/arch/x86/boot/uki.efi objcopy -O binary -j.linux image/usr/src/linux-6.10.9-gentoo-dist/arch/x86/boot/uki.efi bzImage ``` Comparing the files created by GNU objcopy and LLVM objcopy: ``` -rwxr-xr-x 1 mgorny mgorny 19606512 09-17 11:15 bzImage.gnu -rwxr-xr-x 1 mgorny mgorny 19607040 09-17 11:15 bzImage.llvm ``` The LLVM file has additional 512 bytes at the front: ``` 00000000 4d 5a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |MZ..............| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000030 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 |............@...| 00000040 50 45 00 00 64 86 01 00 de 9a d0 66 00 00 00 00 |PE..d......f....| 00000050 00 00 00 00 f0 00 2e 02 0b 02 00 00 ae 87 00 00 |................| 00000060 00 00 00 00 00 00 00 00 e0 96 00 00 00 10 00 00 |................| 00000070 00 00 f9 4d 01 00 00 00 00 10 00 00 00 02 00 00 |...M............| 00000080 00 00 00 00 00 01 05 00 01 00 01 00 00 00 00 00 |................| 00000090 00 c0 dd 05 00 02 00 00 00 00 00 00 0a 00 60 01 |..............`.| 000000a0 00 00 10 00 00 00 00 00 00 10 00 00 00 00 00 00 |................| * 000000c0 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 |................| 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 28 dd 05 f0 09 00 00 |.........(......| 000000f0 00 10 01 00 84 00 00 00 00 00 00 00 00 00 00 00 |................| 00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000140 00 00 00 00 00 00 00 00 2e 6c 69 6e 75 78 00 00 |.........linux..| 00000150 f0 2b 2b 01 00 90 b2 04 00 2c 2b 01 00 02 00 00 |.++......,+.....| 00000160 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 40 |............ ..@| 00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * ``` And 16 bytes (of padding?) at the end: ``` 012b2df0 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc |................| ``` In fact, if I run GNU objcopy without `-O binary`, I get roughly the same format as LLVM gives. This leads me to conclude that LLVM objcopy does not implement `-O binary` correctly, and instead uses PE output, same as the original file. ``` $ file bzImage.* bzImage.gnu: Linux kernel x86 boot executable bzImage, version 6.10.9-gentoo-dist (root@devbox) #1 SMP PREEMPT_DYNAMIC Sun Sep 8 11:45:05 -00 2024, RO-rootFS, swap_dev 0X12, Normal VGA bzImage.gnu-without-O-binary: PE32+ executable (EFI application) x86-64 (stripped to external PDB), for MS Windows bzImage.llvm: PE32+ executable (EFI application) x86-64 (stripped to external PDB), for MS Windows ```
jh7370 commented 4 weeks ago

llvm-objcopy doesn't support -O binary for non-ELF inputs currently (note that the -O option is listed under ELF-specific currently in the command guide). I expect all it's doing is ignoring the option and doing regular object copying from input to output file (so from PE/COFF to PE/COFF). Somebody would need to add support for -O binary to the COFF mode of llvm-objcopy for this to work.

mgorny commented 4 weeks ago

I wish it failed rather than silently ignoring it and giving people non-booting systems, though.

jh7370 commented 4 weeks ago

A contribution would be welcome: we already have the framework for unsupported options (see https://github.com/llvm/llvm-project/blob/64cfce95d38d6884d501fd1ece959e7809a94025/llvm/lib/ObjCopy/ConfigManager.cpp#L16), but it looks like this one got missed for some reason. NB: I haven't actually sanity checked that the option is expected to be unsupported - I've only looked at the docs.

Nowa-Ammerlaan commented 4 weeks ago

Using --dump-section instead of -O and -j seems to work as a workaround:

nowa-gentoo-laptop nowa # objcopy -O binary -j.linux /efi/EFI/Linux/linux-6.10.7-gentoo-dist.efi /tmp/linux
nowa-gentoo-laptop nowa # llvm-objcopy -O binary -j.linux /efi/EFI/Linux/linux-6.10.7-gentoo-dist.efi /tmp/linux-llvm
nowa-gentoo-laptop nowa # diff /tmp/linux /tmp/linux-llvm
Binary files /tmp/linux and /tmp/linux-llvm differ
nowa-gentoo-laptop nowa # sbverify /tmp/linux --cert /root/kernel_key.pem
Signature verification OK
nowa-gentoo-laptop nowa # sbverify /tmp/linux-llvm --cert /root/kernel_key.pem
zsh: segmentation fault (core dumped)  sbverify /tmp/linux-llvm --cert /root/kernel_key.pem
nowa-gentoo-laptop nowa # objcopy /efi/EFI/Linux/linux-6.10.7-gentoo-dist.efi --dump-section .linux=/tmp/linux-dumped
nowa-gentoo-laptop nowa # llvm-objcopy /efi/EFI/Linux/linux-6.10.7-gentoo-dist.efi --dump-section .linux=/tmp/linux-dumped-llvm
nowa-gentoo-laptop nowa # diff /tmp/linux-dumped /tmp/linux-dumped-llvm
nowa-gentoo-laptop nowa # sbverify /tmp/linux-dumped --cert /root/kernel_key.pem
Signature verification OK
nowa-gentoo-laptop nowa # sbverify /tmp/linux-dumped-llvm --cert /root/kernel_key.pem
Signature verification OK