ilg-deprecated / arm-none-eabi-gcc-build

DEPRECATED! -> Project moved to xPack Dev Tools ->
https://github.com/xpack-dev-tools/arm-none-eabi-gcc-xpack
MIT License
18 stars 4 forks source link

LTO is not having any effect #3

Closed Novakov closed 5 years ago

Novakov commented 5 years ago

It seems that LTO is not used during build even if enabled explictly. Here is repository with simple reproduction: https://github.com/Novakov/eclipse-mcu-gcc-lto

Result binary size with and without LTO is exactly the same and similar to size of binary generated by ARM provided GCC toolchain. However GCC provided by Arch Linux behaves correctly and binary with LTO is much smaller.

It seems to me that issue is in build process. I've compared configure flags for both distributions and unfortunately there is nothing that suggests what is wrong.

ilg-ul commented 5 years ago

I seriously doubt so, since for the projects generated by the Eclipse plug-ins -flto behaves as expected.

Please provide the detailed compile output log showing the commands used during the build.

The cmake configurations are probably the less useful way of argumenting this.

Novakov commented 5 years ago

I found this issue on existing CMake-based project so it was easier to strip it down that recreate in Makefile. Commands executed during build with and without LTO are here: https://gist.github.com/Novakov/33b66e11041131ddb7b1fd76f0649cc8

Building with LTO enabled is passing -flto flag to both compiler and linker. Building without LTO is not passing it. The size of generated binary is the same 3232 bytes. Running the same build with arm-none-eabi-gcc 5.4.1 toolchain from launchpad gives different binary: 1608 bytes with LTO and 3184 without LTO. Using arm-none-eabi-gcc 8.3.0 from ArchLinux gives similar results: LTO-enabled binary is much smaller. I'm unable to verify the issue on GCC 8.2.1 from ARM/LaunchPad as it is failing on LTO build (build without LTO gives 3156 bytes binary).

ilg-ul commented 5 years ago

Could you retry without using the intermediate library?

Novakov commented 5 years ago

Results in the table:

Toolchain LTO? Size (with static library) Size (without static library)
LaunchPad 8.2.1 Yes Failed Failed
LaunchPad 8.2.1 No 3156 2964
Eclipse MCU 8.2.1 Yes 3232 2924
Eclipse MCU 8.2.1 No 3232 3040
LaunchPad 5.4.1 Yes 1608 1588
LaunchPad 5.4.1 No 3184 3012

I've updated gist (https://gist.github.com/Novakov/33b66e11041131ddb7b1fd76f0649cc8) with commands used to build LTO version: there are no ar/ranlib calls.

Not using static library gives small improvement is binary size but LTO is still not working.

ilg-ul commented 5 years ago

Your -Ox settings are not consistent; to compare builds please use -Os for all of them, both compile & link commands.

And, if possible, avoid redefinitions on the same command line (normally the last value is used, but you never know).

Also you can pass -v to the linker, to see what it is doing.

Novakov commented 5 years ago
Toolchain LTO? Size (with static library) Size (without static library)
LaunchPad 8.2.1 Yes Failed Failed
LaunchPad 8.2.1 No 2964 2964
Eclipse MCU 8.2.1 Yes 3040 2880
Eclipse MCU 8.2.1 No 3040 3040
LaunchPad 5.4.1 Yes 1580 1580
LaunchPad 5.4.1 No 3012 3012

With -Os and no -O2/-O3 (see gist for updated commands). Good idea with enabling verbose ouput from linker, I will try to compare it between 5.4.1 and 8.2.1

ilg-ul commented 5 years ago

so, the conclusion is that either ar or ranlib disables LTO.

did you check the GCC bugzilla for something related to this?

since the Arch version (8.3) seems ok, perhaps the bug was already fixed.

Novakov commented 5 years ago

It was my first though to blame ar/ranlib but size difference is still huge (2880 vs 1580). I realize that I'm comparing different GCC versions here but 50% increase in size seems like serious regression. In the meantime I downgraded gcc on Arch machine to 8.2.0 (no 8.2.1 release) and check the results: 1900 bytes with LTO and 3328 bytes without LTO.

I was going through GCC bugzilla looking for LTO issues in general but nothing seemed related, maybe focusing on ar/ranlib issues would give better results :)

Verbose output from linker showed something interesting: in the gist I added two files: collect-2-launchpad-8.2.1-lto.txt and collect2-eclipse-8.2.1-lto.txt. They are extracted collect2 call options from linker output and normalized for easier comparision (replaced toolchain/source root paths with variables). Call in Eclipse GCC 8.2.1 is missing few plugin related options:

-plugin
$TOOLCHAIN/bin/../lib/gcc/arm-none-eabi/5.4.1/liblto_plugin-0.dll
-plugin-opt=$TOOLCHAIN/bin/../lib/gcc/arm-none-eabi/5.4.1/lto-wrapper.exe
-plugin-opt=-fresolution=C:\Users\Novakov\AppData\Local\Temp\ccd6fy5j.res
-plugin-opt=-pass-through=-lgcc
-plugin-opt=-pass-through=-lg_nano
-plugin-opt=-pass-through=-lc_nano
-plugin-opt=-pass-through=-lgcc
-plugin-opt=-pass-through=-lc_nano
-plugin-opt=-pass-through=-lnosys
-plugin-opt=-pass-through=-lgcc
-plugin-opt=-pass-through=-lc_nano
-plugin-opt=-pass-through=-lnosys

GCC 8.2.0 on Arch Linux is also using these options.

Is there any place I can find Eclipse GCC 5.4.1 for comparision?

ilg-ul commented 5 years ago

but 50% increase in size seems like serious regression

please note that you are also comparing libraries, which may differ significantly between versions.

if you want to have more accurate results, you have to use -nodefaultlibs. newlib is not compiled with -flto.

I suggest you also test the STM32F4 blinky project generated by the Eclipse template, with and without LTO.

Is there any place I can find Eclipse GCC 5.4.1 for comparision?

unfortunately not, by that time there was no Eclipse MCU ARM Embedded GCC distribution.

also please note that 8.2.1 being different from ARM was an accident, I had to patch some of the bugs, but otherwise the GME distribution tries to be as close as possible to ARM distribution.

Novakov commented 5 years ago

I did some tests with various toolchains and compilation options:

I also played with crosstool-ng and manged to build toolchain (based on GCC 8.3.0) that produces small binaries (they are even few bytes smaller than produced by GCC 5.4.1 from LaunchPad). I tried to find the difference between build process from crosstool-ng and yours and the only thing that seems reasonable is the way lto-plugin.dll is built: in ct-ng it is built in single make call in top GCC folder (so GCC itself manages configure options passed to lto-plugin) while scripts in this repository are reconfiguring to disable static & enable shared library.

Regarding the '8.2.1 accident' - I'm really happy that you are maintaining GCC distribution so we don't have half a year for ARM to release a bug fix release 👍

ilg-ul commented 5 years ago

The general idea is to follow the ARM build scripts, plus the few patches that were needed, or different commits.

If you have concrete proposals to improve the build scripts, please let me know.


Please note that right now I'm testing the new version of the build scripts, available in the develop branch.

Novakov commented 5 years ago

Are the current build scripts from ARM available? Older versions had 'How to build toolchain' PDF but I don't see it anywhere. I will check develop branch next week and try to findout why LTO is behaving so differently.

I checked on the STM32F4 blink template (not modified in any way) project and it results confirms the issue that LTO is not working entirely correct:

ilg-ul commented 5 years ago

Are the current build scripts from ARM available?

yes, in the source archive.

I also keep a git with these scripts, to identify the changes:

https://github.com/gnu-mcu-eclipse/arm-gcc-original-scripts

STM32F4 blink template ... confirms the issue that LTO is not working entirely correct

if I remember right, my tests also produced similar results for the first two values.

if you could build the same 8.2.1 version with crostool-ng, the comparison for the third value would make more sense.

I will check develop branch next week

yes, any help will be highly appreciated.

you'll probably find the scripts very complicated. they are. the scripts are intended only to generate standalone distributions. thus the purpose is to have reproducible build environment where each component version is strictly controlled, and no references are made to the system resources, which is a very old CentOS 6.

the actual scripts run inside a Docker container, where all available tools were also compiled from sources, so nothing is left to chance.

this special build environment is called XBB, the xPack Build Box, and I use it for all binary xPacks.

Novakov commented 5 years ago

GCC 8.2.1 (the same sources and gcc patch as used in this repository) built with crosstool-ng (the same config as for 8.3.0) with LTO enabled gives exactly the same result as 8.30 both for STM32F4 blink project and sample project I provided.

I also briefly looked at ARM build scripts and it seems to me that they are not reconfiguring/rebuilding lto-plugin. If that's true, then it supports my findings from crosstool-ng build process that reconfiguring/rebuilding (I would place my bet on ./configure) lto-plugin causes it to malfunction. What is really strange is that compiling/linking with LTO is not failing, it just does almost nothing.

ilg-ul commented 5 years ago

right now I'm facing some problem with the new build scripts for Linux, where strip damages the binaries, but once I fix this we'll take a look at the lto build details.

ilg-ul commented 5 years ago

I updated the develop branch with a version that seems functional. tomorrow I'll try to investigate why building gdb with python3 fails.

in the mean time, could you take a look at the script (container-gcc-functions-source.sh) and suggest how to improve LTO support? things are quite tricky, since for windows the LTO plug-in needs to be configured and built separately, plus that the plug-in needs to be copied in a second location.

LTO is not failing, it just does almost nothing.

well, according to your figures, it reduces the size by half.

it is interresting what miracle happens with your build, that reduces the size even more.

ilg-ul commented 5 years ago

I took a quick look at crosstool-ng, and it looks like it builds newlib with LTO, which might be a good explanation why it further reduces the size.

I considered this too for future releases, but the main concern was the total distribution size, since LTO significantly increases each object size, and there are lots of multilib libraries. the second concern was build time, which also increases.

could you check your builds for CT_LIBC_NEWLIB_LTO, and possibly run a separate toolchain build without it, to compare resulting toolchain size?

there is also a mention of -flto-partition=one that needs to be used when building the application, did you check it?

ilg-ul commented 5 years ago

I did a quick test to compile newlib with -flto, and the single lib archive size increased from 59 MB to 73.5 MB.

plus that I get lots of:

arm-none-eabi-objcopy: /Users/ilg/Work/arm-none-eabi-gcc-8.2.1-1.5/darwin-x64/install/arm-none-eabi-gcc/arm-none-eabi/lib/stIi2D6c/lib_a-setvbuf.o: plugin needed to handle lto object
Novakov commented 5 years ago

Program I use for testing is capable of building without newlib (-nodefaultlibs -nostartfiles). I'm building toolchain with CT_LIBC_NEWLIB_LTO disabled.

arm-none-eabi-objcopy error might be another symptom of invalid LTO plugin build.

Regarding container-gcc-functions-source.sh: could you try built shared version of GCC without reconfiguring/rebuilding lto-plugin?

For blink project I only compared predefined Debug and Release configurations I didn't analyzed flags used in both so it may be possible that some other flag (beside -flto) is resposible for size reduction. That's why I'm testing on much smaller code base, with explicitly defined options with and without -flto flag.

Building newlib with LTO is not a good idea: that will force everyone to use LTO in their builds which is painful during development, so toolchain would need to provide newlib with both LTO enabled and disabled.

ilg-ul commented 5 years ago

arm-none-eabi-objcopy error might be another symptom of invalid LTO plugin build.

I have to check more thoroughly, but objcopy is used only on the libraries, to explicitly remove some debug related sections.

for testing, I can build a toolchain without calling strip_libs.

could you try built shared version of GCC without reconfiguring/rebuilding lto-plugin?

the script rebuilds the lto plug-in only for windows, and only if it was not built by the normal sequence. for linux & mac the procedure is relatively straightforward.

any more ideas?

Building newlib with LTO is not a good idea: that will force everyone to use LTO in their builds which is painful during development, so toolchain would need to provide newlib with both LTO enabled and disabled.

I did not test it, but why do you think so? my understanding is that if you do not pass -flto, the compiler behaves in the traditional way, and the linker ignores the LTO sections of the libraries.

ilg-ul commented 5 years ago

For blink project I only compared predefined Debug and Release configurations I didn't analyzed flags used in both so it may be possible that some other flag (beside -flto) is resposible for size reduction.

the way to test this is to create the project with default settings, then duplicate both the Debug and Release configurations (like Debug-lto and Release-lto) and in the duplicates to enable the -flto in the top Optimizations group.

then, with exactly the same configurations, clean all (preferably by removing the destination folders), change the project toolchain path to the new toolchain, and build all 4 configurations again.

Trass3r commented 5 years ago

if you do not pass -flto, the compiler behaves in the traditional way, and the linker ignores the LTO sections of the libraries

That's only true with -ffat-lto-objects which is not the default anymore.

ilg-ul commented 5 years ago

Thank you @Trass3r for the clariication.

Can you confirm that compiling static libraries with -flto -ffat-lto-objects produces libraries that, when linked without -flto, behave in the traditional way, and when compiled with -flto perform the link time optimizations?

Do you see any disadvantage of building the system libraries (libgcc, libstdc++ and newlib) like this? (except the larger size and the increased toolchain build time).

ilg-ul commented 5 years ago

I did a test run of the build script with -flto added to the system libraries and I got a 1.25 GB macOS archive, way too large to be practical. the similar previous release was 113 MB.

we'll forget about LTO system libraries for now, and try to identify why LTO is less efficient than the one built with crostools-ng.

ilg-ul commented 5 years ago

@Novakov, I ran a comparative test with Arch 8.3.0 arm-none-eabi-gcc and, for the Release configuration of the stm32f4 blinky project, I got exactly the same sizes as with the GME 8.2.1, namely 3724 without lto and 2188 with lto, which is a reduction of 41%.

I'm afraid your tests are not comparing the same configurations.

Novakov commented 5 years ago

Oh, I just now realized that I didn't mention that I'm working on Windows (sorry!). I tested various toolchains using CMake project changing only -flto flag and toolchain path.

If GME and Arch GCC are behaving correctly I would say that support hypothesis about lto-plugin being messed up.

ilg-ul commented 5 years ago

I don't know, I'm trying to reproduce your findings, but have difficulties, it looks like there are too many variables, and you might not control all of them.

for example, in GME I measure sizes with the arm-none-eabi-size program, which reports separate text/data/bss sections.

if you have more ideas how to test this, plese let me know.

otherwise I'll proceed with the release.

Novakov commented 5 years ago

I'm sorry for not mentioning I'm working on Windows in the first comment and all confusion that it caused.

I'm working on Windows 10 x64 Toolchains:

Project: STM32F4 blink template, not modified Build configuration:

Measurement: dec field in output from arm-none-eabi-size

crosstool-ng 8.3.0 GME 8.2.1 LaunchPad 5.4.1
Debug 9455 9419 9599
Release without LTO 4148 4112 4124
Release with LTO 2600 4002 2548

These results are consistent with results I received for sample project I linked at the begining which was compiled using the same toolchains on the same machine.

Flags like -nodefaultlibs, -nostartfiles, -Ox change sizes but overall trend is the same - GME 8.2.1 LTO binaries are much bigger.

If GME 8.2.1 on Linux and Arch GCC 8.3.0 are producing binaries with the same sizes that we can narrow down problem to Windows build.

If you still have trouble with reproducing the issue maybe we can make some conference call/screen sharing session to sort it out.

ilg-ul commented 5 years ago

we can narrow down problem to Windows build.

yes, the windows build was always problematic; I'll take another look, maybe I spot something wrong.

ilg-ul commented 5 years ago

Fixed in 8.2.1-1.6.