rdiez / DebugDue

Emulates a Bus Pirate OpenOCD JTAG adapter with an Arduino Due
48 stars 9 forks source link

GCC 13.x libsupc++ Exception Pool Size #2

Closed pete-restall closed 2 months ago

pete-restall commented 3 months ago

Hello,

I've been standing up a bare metal C++ project using the toolchain built from DebugDue and I have discovered that building my project with exceptions enabled pulls in cow_string, string_view, random_device (and it's large tables / expectation of a filesystem entropy provider), etc. A basic blinky for my Cortex M4F build mushrooms from about 580 bytes to around 38KiB.

I believe the issue boils down to these few lines in the Makefile:

  else ifneq "$(IS_GCC_13_X)" ""
    COMMON_GCC_OPTIONS += --with-libstdcxx-eh-pool-obj-count=0
  endif

We definitely want the pool count to be 0 but looking at the libsupc++ 's gcc-13.2.0/libstdc++-v3/libsupc++/eh_alloc.cc, I can see this conditional:

#if defined _GLIBCXX_EH_POOL_STATIC && EMERGENCY_OBJ_COUNT == 0
# define USE_POOL 0
#else
# define USE_POOL 1
#endif

The above implies that it's not enough to just set the object count to 0, we also need the static pool enabling. I believe that the desired effect can thus be had by changing the DebugDue Makefile to something like this:

  else ifneq "$(IS_GCC_13_X)" ""
    COMMON_GCC_OPTIONS += --enable-libstdcxx-static-eh-pool --with-libstdcxx-eh-pool-obj-count=0
  endif

That seems to work and knocks off about 28.5KiB from my build. I'm now sitting at around 9.5KiB. That's still not amazing but looking through my ELF it's mainly down to the unwinding logic and personality routine, plus obviously the .eh_frames and other bits that I'd expect such as malloc(), memset(), etc. I can probably tweak this further but importantly, though, there's no std::string bloat and no std::random_device cruft.

I'll continue to play and see if I can trim some more fat from this, but I think realistically it's not too bad - from memory I seem to recall having a blinky running on newlib with exception support sitting around 14KiB, so this picolibc build is significantly better. I've not tested this very well yet but I think it has the desired effect.

Cheers.

rdiez commented 3 months ago

Thanks for the heads up. I am rather busy at the moment, so I cannot look at this right now.

pete-restall commented 3 months ago

No worries at all. I have my own fork because I use your Makefile to build a MIPS toolchain for PIC32 in addition to ARM, so I don't need / expect you to fix it for me.

You've been kind enough to share (and document) the pain of building these toolchains and have saved me the grief of doing it myself, so I just thought I'd feed back the little I can. If I find anything else that might be of interest / use, I'll post it for you.

Cheers.

rdiez commented 3 months ago

Thanks, I appreciate it.

rdiez commented 2 months ago

I just upgraded my toolchain builder script to GCC 13.3, among other things, and I am taking a look at this issue now.

You are right: --with-libstdcxx-eh-pool-obj-count=0 is not enough, we want --enable-libstdcxx-static-eh-pool too.

Apparently, without --enable-libstdcxx-static-eh-pool, libstdc++ calls getenv() on start-up to query an eventual environment variable named GLIBCXX_TUNABLES, in case the pool size is overridden to a non-zero amount and the pool must be allocated with malloc().

Therefore, without --enable-libstdcxx-static-eh-pool, libstdc++ pulls code for getenv() and std::string_view(). You can see those calls in libstdc++-v3/libsupc++/eh_alloc.cc , inside pool::pool(). There is a small parser for the environment variable's value too, which pulls strtoul() as well.

I tested with Newlib and with a debug (non-optimised) build of my QemuFirmware, which performs minimal initialisation, prints some basic information like the firmware code and data sizes to the emulated serial console, and then stops (it waits forever).

Without --enable-libstdcxx-static-eh-pool, the code size was 24,000 bytes, and with it, 21,220 bytes, therefore saving 2,780 bytes. There were small savings in RAM usage too. Those savings were expected.

However, I didn't see the big savings you mentioned. First of all, I didn't see any random_device or std::string bloat before adding --enable-libstdcxx-static-eh-pool. I read about std::string_view, and it should be much leaner than std::string. There is no apparent reason why using std::string_view would pull std::string too.

I then built a release (optimised with LTO) version, and the code size shrank to 10,552 bytes, which is a very good result for C++ in my opinion. RAM usage was 660 bytes.

The release build with Picolibc yields a code size of 10,040 bytes, saving 512 Bytes of code. RAM usage decreases to 72 bytes.

However, I wouldn't trust Picolibc yet. For example, it is known since at least 2 years that routines like sprintf may suffer from drastic performance losses, but as far as I can see, the official documentation still does not mention this pitfall.

rdiez commented 2 months ago

Fixed with commit 309b2b72a7719189ad71cb3dc286227c8edb0aa3.

pete-restall commented 2 months ago

Thanks for confirming. It's been a while so I cannot remember the exact flags, etc. that I used when building my empty application. I can try rebuilding the DebugDue toolchain and regenerating my empty ELFs if you think it would be helpful.

I do seem to also remember having -fpic turned on as well, before I realised that there are some 'features' in ld that mean certain ARM relocation types silently do nothing. Took me a while to figure out why my binaries were corrupt. I don't think that will explain that large a size difference, I'm just saying I've changed flags a few times since and cannot remember what the original build commands were. Pretty certain I was using -O2 though.

Thanks for the tip on picolibc. For my purposes it seems alright so far, but in all honesty I've not done much with it. I try to avoid most of the library functions (especially the *printf suite) so the ones I get caught by are the implicit calls used by the likes of libstdc++.

Cheers.