llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.7k stars 11.87k forks source link

llvm-strip --strip-debug on riscv64 produces unusually large binaries #89524

Open q66 opened 6 months ago

q66 commented 6 months ago

I started noticing that in my distribution binaries on riscv64 come out roughly 3.5 times larger than they should be. We compile with -g2 by default and then process everything with llvm-strip --strip-debug. On all other architectures (x86_64, aarch64, ppc64le, ppc64) things come out more or less the same as before.

The set of flags used does not matter, other than the debug level. Dropping debug level to -g0 produces small binaries. Using strip without arguments likewise produces small binaries; --strip-debug does not, however.

For instance, a build of Lua 5.4 package has an installed size of 3.5MB instead of 1MB on riscv64 now. It seems to apply to all packages in general.

llvmbot commented 6 months ago

@llvm/issue-subscribers-tools-llvm-objcopy-strip

Author: q66 (q66)

I started noticing that in my distribution binaries on riscv64 come out roughly 3.5 times larger than they should be. We compile with `-g2` by default and then process everything with `llvm-strip --strip-debug`. On all other architectures (x86_64, aarch64, ppc64le, ppc64) things come out more or less the same as before. The set of flags used does not matter, other than the debug level. Dropping debug level to `-g0` produces small binaries. Using `strip` without arguments likewise produces small binaries; `--strip-debug` does not, however. For instance, a build of Lua 5.4 package has an installed size of 3.5MB instead of 1MB on riscv64 now. It seems to apply to all packages in general.
q66 commented 6 months ago

More information: it does not seem to be related to the tools. Using binutils strip exhibits the same behavior with overbloated binaries with --strip-debug.

q66 commented 6 months ago

here is readelf -a for an unstripped binary: https://0x0.st/Xo3r.txt

after processing with --strip-debug: https://0x0.st/Xo3s.txt

after processing with strip with no arguments: https://0x0.st/Xo3z.txt

q66 commented 6 months ago

it seems the issue has been present for much longer, actually; this is not an 18 regression

jh7370 commented 6 months ago

I doubt that this is an llvm-objcopy/strip issue, given that using GNU strip produces the same output. I think, if there is an issue, it's much more likely to come from earlier in the pipeline, e.g. the assembler or linker. I'd need to compare the readelf output you've provided with that of a "normal" case, e.g. x86, to see, but my suspicion is that the cause is the many, many unnamed STT_NOTYPE local symbols in the output: llvm-strip --strip-debug would do nothing with those. However, when it is run without arguments, it removes the symbol table, so those symbols will disappear completely, removing any impact they have on the final binary size.

llvmbot commented 6 months ago

@llvm/issue-subscribers-backend-risc-v

Author: q66 (q66)

I started noticing that in my distribution binaries on riscv64 come out roughly 3.5 times larger than they should be. We compile with `-g2` by default and then process everything with `llvm-strip --strip-debug`. On all other architectures (x86_64, aarch64, ppc64le, ppc64) things come out more or less the same as before. The set of flags used does not matter, other than the debug level. Dropping debug level to `-g0` produces small binaries. Using `strip` without arguments likewise produces small binaries; `--strip-debug` does not, however. For instance, a build of Lua 5.4 package has an installed size of 3.5MB instead of 1MB on riscv64 now. It seems to apply to all packages in general.
jh7370 commented 6 months ago

I've added the RISC-V label, since I reckon that it's in this area that any issue will be present.

q66 commented 6 months ago

yes, i also suspect all these NOTYPE local symbols are the issue

MaskRay commented 6 months ago

I agree that this is a RISC-V issue instead of an llvm-objcopy issue. These empty name symbols are generated for assembler directives related to .eh_frame/.debug_line. gas uses a fake label name .L0 which will be discarded by ld/objcopy --discard-locals.

I created https://github.com/llvm/llvm-project/pull/89693 to match gas. I was aware of the behavior difference but did not think hard about the size concern when ld/objcopy -X are concerned.


For distributions, --strip-unneeded might be handy if you don't need .symtab and --strip-all might be useful if you don't need more non-SHF_ALLOC sections like .comment.

As a workaround, you can apply llvm-objcopy --strip-symbol='' for executables/DSOs if they are not linked with -Wl,--emit-relocs. If --emit-relocs is used, the option would likely lead to errors: "not stripping symbol '' because it is named in a relocation"