Open profi200 opened 3 years ago
if you use target_compile_definitions
or target_compile_options
on your target/library you should get the right thing.
Note that -O2
is the default (for "Release" or "RelWithDebugInfo" CMAKE_BUILD_TYPE
), -Og
for "Debug", -Os
for "MinSizeRel"
First of all thanks for making this so painless. It worked basically right out of the box even with a different arm-none-eabi toolchain.
Thanks, we have tried hard to make things "easy" for the beginner, but give the power user full control. This is quite hard to do (especially when we have both native and device compilation going on). So there are certainly things that are still a little hidden/hard coded
(Note if you search elsewhere you can also configure per source flags with set_source_file_properties
.
-flto
is on our list to try (but haven't done so yet for anything other than the bootrom). I suspect it will drop stuff on the floor at the moment. Plus we have been thinking with -gc-sections
in mind, but that shouldn't make much odds.
Note things you can try to tweak your build config further (if you should so wish) other than just editing what is there.
cmake/preload/toolchains/pico_arm_gcc.cmake
and set PICO_COMPILER
pico_standard_link
library, and either give it a new name and explicitly direct on your version instead (rather than via pico_stdlib), or perhaps more easily, define set(SKIP_PICO_STANDARD_LINK 1)
at the top of your CMakeLists.txt then add your own pico_standard_link
version after the project init.Some of this stuff might be a little off the happy path for now, but we are keen to improve.
Are you sure? It does use -O3 here for release builds. Not sure if this is due to some cmake system default (i definitely didn't set that). Maybe that should be explicitly set in the SDK. And btw i have always been disabling -fdata-sections because it can have a negative impact. I stumbled upon these issues when i noticed even the blink example is 12 KiB in size which is a bit heavy for something that toggles a GPIO + delays.
I have barely any experience with cmake and rather stayed away from it when possible. I guess i need to dive deeper into what i can and can't set in CMakeLists.txt.
Yes, i know the pain. LTO can do insane optimizations that break non-standard conforming or code with undefined behavior in weird ways. Like moving global variable writes completely elsewhere when other functions depend on it.
edit: Oh, and a little side note: Maybe the "-mcpu=cortex-m0plus" should be replaced by "-mtune=cortex-m0plus". gcc is starting to abandon -mcpu on a few platforms.
C/CXX flags I tried to override the C and linker flags to have different optimization than default by doing
cmake .. -DCMAKE_C_FLAGS="-O2 -flto -fno-data-sections" -DCMAKE_EXE_LINKER_FLAGS="-O2 -flto"
The result is this order of flags to the compiler:
...-I/home/blub/pico/pico-sdk/src/rp2_common/pico_stdio_uart/include -O2 -flto -fno-data-sections -O3 -DNDEBUG -ffunction-sections -fdata-sections -o CMakeFiles/blink.dir/home/blub/pico/pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c.obj -c /home/blub/pico/pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c
Which means it's basically overriding my flags instead of the other way around. Somewhere it's including the override flags before the usual SDK given flags.
So if I add add_compile_options(-flto)
in my project's CMakeLists.txt-file (between project
and add_executable
statements), it seems to be in effect. All compiler invocations work fine. However, during linking, I get this:
/tmp/cc1KABxt.s: Assembler messages:
/tmp/cc1KABxt.s:11061: Error: invalid offset, value too big (0x000041F4)
For very small projects (aka hello world) it seems to work fine. But for anything useful, it simply fails with the message above or similar.
gcc is:
kripton@momo ~/git/rp2040-compressiontest/build2 $ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/arm-none-eabi/11.1.0/lto-wrapper
Target: arm-none-eabi
Configured with: /var/tmp/portage/cross-arm-none-eabi/gcc-11.1.0-r1/work/gcc-11.1.0/configure --host=x86_64-pc-linux-gnu --target=arm-none-eabi --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/arm-none-eabi/gcc-bin/11.1.0 --includedir=/usr/lib/gcc/arm-none-eabi/11.1.0/include --datadir=/usr/share/gcc-data/arm-none-eabi/11.1.0 --mandir=/usr/share/gcc-data/arm-none-eabi/11.1.0/man --infodir=/usr/share/gcc-data/arm-none-eabi/11.1.0/info --with-gxx-include-dir=/usr/lib/gcc/arm-none-eabi/11.1.0/include/g++-v11 --with-python-dir=/share/gcc-data/arm-none-eabi/11.1.0/python --enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 11.1.0-r1 p2' --disable-esp --enable-libstdcxx-time --with-build-config=bootstrap-lto --enable-poison-system-directories --disable-libstdcxx-time --with-sysroot=/usr/arm-none-eabi --disable-bootstrap --with-newlib --enable-multilib --disable-fixed-point --disable-libgomp --disable-libssp --disable-libada --disable-systemtap --enable-valgrind-annotations --disable-vtable-verify --disable-libvtv --with-zstd --enable-lto --with-isl --disable-isl-version-check --disable-libsanitizer --disable-default-pie --enable-default-ssp --with-cpu=cortex-m0plus
Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Gentoo 11.1.0-r1 p2)
binutils is v2.36.1
yeah, i think -flto
is very free and easy with reordering of code... there are probably some assembly functions that expect to be proximate to each other. Sadly in some cases being the best way for LTO or for GC may not be the same thing, so we may have to special case.
Note assigning myself, as we do plan to look at LTO in 1.3.0 release
Oh, and a little side note: Maybe the "-mcpu=cortex-m0plus" should be replaced by "-mtune=cortex-m0plus". gcc is starting to abandon -mcpu on a few platforms.
See https://github.com/raspberrypi/pico-sdk/issues/253#issuecomment-798428466 (and #260 )
If it helps, disabling LTO for pico-sdk/src/rp2_common/hardware_irq/irq.c
solves the "invalid offset, value too big" error in my case.
The linker however then spits out a ton of undefined references related to __wrap_
functions, which pointed me to these few bug reports, which may be relevant:
I tried using the gold
linker, but it seems to dislike the linker scripts, and spits out some errors about a COPY section type: pico-sdk/src/rp2_common/pico_standard_link/memmap_default.ld:123:23: COPY section type is unsupported
. According to the source, only NOLOAD
is supported. I tried replacing the (COPY)
s with (NOLOAD)
s, which I assume is a bad idea, and... ended up triggering an internal error elsewhere anyway.
I tried using lld
, but -fuse-ld=lld
causes an error because on my system it looks for /usr/bin/arm-none-eabi-ld.lld
. Symlinking /usr/bin/lld
(which I didn't really expect to work anyway) fails with:
[build] arm-none-eabi-ld.lld: error: no memory region specified for section '.data'
[build] arm-none-eabi-ld.lld: error: no memory region specified for section '.bss'
... so I'm assuming figuring out what to do with the --wrap
issue and fixing it would be the next step to get a proof of concept compilation with LTO.
... Good luck with this :)
Looks like I got a bit further after all. In platform.h
, I modified WRAPPER_FUNC
to be defined as such:
#define WRAPPER_FUNC(x) __attribute__((used)) __wrap_ ## x
Now it successfully compiles and run.
Compared to having LTO activated for everything except the SDK (assuming I didn't mess up something), the generated .bin
file is slightly larger, but I suspect this is may be due by a few things:
[[gnu::flatten]]
somewhere (as it measurably improved performance), which may be inlining some SDK functions aggressively (SPI ones in particular).__attribute__((used))
workaround may be causing some of the wrappers to not get eliminated even though they may be unused, including its callees.I have not tested for performance, however.
(EDIT: as an interesting note, setting PICO_NO_GC_SECTIONS
while LTO is active results in a significantly larger executable. I'm assuming that LTO cannot eliminate functions as much as the linker itself can. Does --gc-functions
affect functions tagged with __attribute__((used))
?)
Thanks; ugh i wasn't aware of the --wrap
/LTO issue
While everything seems to run fine with my workarounds applied and I have not encountered any breakage, I have realized something (which in retrospect is rather explicit in the reports I've linked to): gcc+bfd with LTO could inline some of the original, non-wrapped functions.
I have probably not encountered issues because function wraps that may have mattered were not inlined and the call site still referred to the symbol that got wrapped later on.
That may rather likely be because most wrapped functions are compiled in the newlib static archive (or whatever it is), which is not built with LTO on my platform... but it could be on another, from what I can tell.
This might be worth investigating further, because I'm not quite sure if that's happening for me here.
Interestingly, using x86-64 desktop linux compilers and a simple test case:
$ cat main.c
#include "orig_decl.h"
int main()
{
foo();
}
$ cat orig_decl.c
#include <stdio.h>
void foo()
{
puts("foo()");
}
$ cat wrap.c
#include "orig_decl.h"
#include <stdio.h>
void __wrap_foo()
{
puts("__wrap_foo()");
}
... only clang+lld properly emitted a call to __wrap_foo
! gcc+bfd, gcc+gold, gcc+lld (doesn't seem to be a supported combination for LTO at all), clang+bfd, clang+gold all emitted a call directly to foo
...
That being said, with clang thin LTO (-flto=thin
), clang+lld AND clang+gold work. I assume this is a difference in the implementation of the LTO plugins for each of the linkers. I feel like that would be in a platform-independent manner, but this may be worth looking if the exact same behavior occurs when targeting armv6.
Rather disappointingly, in the above tests, __wrap_foo()
never gets inlined into main
(probably because wrapping happens after LTO is performed). foo()
gets eliminated only with --gc-sections
.
Unfortunately, I'm quite not sure what the best approach would be. Since clang support is in the same milestone as this, it might be worth trying to support clang LTO with lld only first if it turns out to be simpler to support - or use an approach different from --wrap
, which here seems very much non-trivial, if possible at all.
(FWIW, LTO with any linker but bfd
would be a significantly better developer experience due to that usecase being really slow due to lack of parallelization from what I can tell)
This is such a simple fix, why after almost 3 years hasn't it been implemented properly yet? I guess I could give it a go...
Edit: Though the retain
attribute may be a better fit: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-retain-function-attribute
I am not up to date, but unless pico-sdk stopped using wrap functions, I do not think the problem I described is gone. In particular, gcc+bfd LTO is insidiously broken.
Adding that attribute to all wrap functions as a proof of concept fixed it for me.
Do make sure that this solution does not cause the issue that I mentioned above (i.e. that inlining may inline the original function rather than the wrapper). Depending on the wrapped function, there may not be a noticeable side-effect for your usecase (it has been fine for me), but it would likely not be upstreamable.
Adding that attribute to all wrap functions as a proof of concept fixed it for me.
do you have a fork of the pico-sdk where we could try that?
Adding that attribute to all wrap functions as a proof of concept fixed it for me.
do you have a fork of the pico-sdk where we could try that?
This change would be a one-liner afaict, but beware the caveats I mentioned in my replies above.
https://github.com/raspberrypi/pico-sdk/commit/7f494d0a194ddbf73713dff482c5e01e5e878a7a
There was some activity in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88643 and it appears that the wrapping issue should be resolved with ld as of binutils 2.43
. My distro doesn't package it yet for arm-none-eabi, though, so I'll see if I can try this easily myself shortly.
EDIT: no dice, still getting undefined references to __wrap_printf
and such. Although the __attribute__((used))
still works (but might still be broken), as long as you set PICO_STDIO_SHORT_CIRCUIT_CLIB_FUNCS=0
(unless patching the pico-sdk anyway).
First of all thanks for making this so painless. It worked basically right out of the box even with a different arm-none-eabi toolchain.
LTO support Linking is failing with these errors:
These errors are quite strange since it's usually the other way around with something not being supported in thumb mode. Have not digged too deep yet to find out what's causing these.
C/CXX flags I tried to override the C and linker flags to have different optimization than default by doing
The result is this order of flags to the compiler:
...-I/home/blub/pico/pico-sdk/src/rp2_common/pico_stdio_uart/include -O2 -flto -fno-data-sections -O3 -DNDEBUG -ffunction-sections -fdata-sections -o CMakeFiles/blink.dir/home/blub/pico/pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c.obj -c /home/blub/pico/pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c
Which means it's basically overriding my flags instead of the other way around. Somewhere it's including the override flags before the usual SDK given flags.Would be nice if at least the C/CXX flags issue can be fixed. I understand if LTO is not happening for a while since it tends to uncover undefined behavior or bugs not seen without but in my experience it does help shrink down code size more.
Full build log: https://gist.github.com/profi200/718e20b9d3ebec87db60f78d0deb994b