Closed chleroy closed 3 years ago
I can reproduce that just by turning on CONFIG_CC_OPTIMIZE_FOR_SIZE
.
$ make -s ppc64le_defconfig
$ grep -e LD_DEAD -e OPTIMIZE_FOR .config
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
$ make -s -j (nproc)
$ objdump -d vmlinux | grep -c "<arch_set_bit>:"
0
$ ./scripts/config -d CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE
$ make -s olddefconfig
$ grep -e LD_DEAD -e OPTIMIZE_FOR .config
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
$ make -s -j (nproc)
$ objdump -d vmlinux | grep -c "<arch_set_bit>:"
82
$ ./scripts/config -e EXPERT -e LD_DEAD_CODE_DATA_ELIMINATION
$ make olddefconfig
$ grep -e LD_DEAD -e OPTIMIZE_FOR .config
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
$ make -s -j (nproc)
$ objdump -d vmlinux | grep -c "<arch_set_bit>:"
82
So CC_OPTIMIZE_FOR_SIZE
causes it AFAICS, and LD_DEAD_CODE_DATA_ELIMINATION
has no effect.
Yeah they're not dead, they just get out-of-lined into each file that calls them. The linker can't really fix this (may not have the right relocations or branch information). It would need some kind of link time optimisation or some new annotation like inline_or_library and then you give it a library copy if it decides not to inline.
Yeah I was originally thinking that something to do with the dead code elimination might be making it worse, but seems unrelated.
It seems this is working as designed, even if the result is a bit surprising.
And AFAICS CC_OPTIMIZE_FOR_SIZE
still "works", it shrinks a ppc64le_defconfig from ~32MB to ~27MB.
Can we close this?
Ok, let's close it as we have identified it is related to CC_OPTIMISE_FOR_SIZE.
We will likely handle the most evident ones one-by-one by flagging them "always_inline" when relevant.
The config provided by the kernel robot in https://lore.kernel.org/lkml/202102271820.WlZCxtzY-lkp@intel.com/T/#u leads to awful duplication of several 'static inline' functions.
arch_local_irq_save
is there 44 timesfls
is there 61 times__ilog2_u32
is there 12 times in one version and 10 times in a second version which callsfls
others like
arch_set_bit
,arch_clear_bit
found many times as wellAnd a lot more, like for instance the I/O accessors
in_be32
/in_le32
, ...