lancaster-university / microbit-dal

http://lancaster-university.github.io/microbit-docs
Other
255 stars 130 forks source link

Static object instances with C++ destructors depend on atexit and eat up RAM #72

Open dpgeorge opened 8 years ago

dpgeorge commented 8 years ago

I recently upgraded my arm-none-eabi-gcc toolchain and now I have a new libc_nano.a library. With this new library the MicroPython port runs out of memory (again) because libc_nano.a includes a new 264 byte structure in the bss.

This new structure is for handling atexit behaviour. They (newlib devs) recently changed atexit so that it uses static RAM to register on-exit functions, instead of dynamically allocating with malloc. See here for background: https://sourceware.org/ml/newlib/2015/msg00936.html

atexit is used by the DAL because it has static object instances which have C++ destructors (eg ManagedString::EmptyString, MicroBitImage::EmtpyImage). When the program exits these destructors should be called (according to the C++ standard). Therefore when these static instances are initialised they register a function with atexit (cxa_atexit via __aeabi_atexit). The big 264 byte data structure in the bss is used to store these function pointers.

Note that even before this change there was 140 bytes used by atexit in the bss. So now there is 140+264 = way too much!

It seems this is a hard problem to solve/workaround. If possible it would be good to provide a dummy implementation of cxa_atexit, or __aeabi_atexit, which overrides the one from libc_nano.a.

Thoughts?

dpgeorge commented 8 years ago

Surprisingly there is an easy fix for this: __register_exitproc is a weak symbol and if you override it then it doesn't pull in the big data structures. So I put the following definition into my main.cpp file and I reclaim back about 400 bytes RAM:

void __register_exitproc() {
}

@finneyj do you think that's the best approach? I can do it in the MicroPython code so it doesn't need to be part of the DAL.

finneyj commented 8 years ago

thanks @dpgeorge - looks good. We should consider if there would be any gains by putting this in the DAL side for other languages when we have a moment, but this sounds like a good approach - suggest you go ahead wit this for now. We're running on minutes ot hit other deadlines at the moment (keep the bug reports coming though!)

mmoskal commented 8 years ago

Micro:bit programs never exit, so I'm sure all languages would be delighted to have the 400 bytes back. Seems like a no-brainer to me!

remay commented 8 years ago

True that main() never returns. Technically I can write code in app_main() that calls exit(), but not sure I can see a reason to ever do this though. I'd certainly like the extra 400 bytes, as to squeeze in the enormous S130 BLE stack I'm down to a heap of only 512 bytes at the moment, and regularly hit out-of-memory conditions.

@dpgeorge, thanks for pointing this out. If I upgrade my toolchain does this result in overall more available memory? Is there any other compelling reason to upgrade my toolchain?

finneyj commented 8 years ago

Absolutely - we definitely all would like to see more memory! Apologies for any confusion. I was just trying to say that James and I have a time critical backlog still to work through, so testing this through on micropython then upstreaming if all seems well felt like an easy win.

@mmoskal, @remay, if either of you guys have a few spare cycles and can try this out, measure how much memory it does free up and do a few basic smoke tests i'd love to see pull request.

dpgeorge commented 8 years ago

You'll only get about 140 bytes back if you're using a non-bleeding-edge version of the toolchain. Still, that's relatively a lot.

jamesadevine commented 8 years ago

@dpgeorge I've recently tried experimenting with this, and couldn't find the __register_exitproc symbol using arm-none-eabi-nm.

What are the steps to reproduce?

dpgeorge commented 8 years ago

What are the steps to reproduce?

It depends heavily on your compiler and C++ library. The functions may be called something else. Look through the .map file and find everything that's in the BSS. They will be named and you can see if there is anything suspiciously large.

jamesadevine commented 8 years ago

Hi @dpgeorge, thanks for that, I believe i have found my equivalent resident at exit overhead:

.bss. _on_exit_args_instance
                0x0000000000000000      0x108 .......gcc/arm-none-eabi/5.3.1/../../../../arm-none-eabi/lib/armv6-m/libc_nano.a(lib_a-on_exit_args.o)

This is on GCC 5.3.1, 4.9.x did not have this overhead as you stated.

The way I managed to remove this symbol was to provide a barebones __cxa_atexit function in my main.cpp:

extern "C"{
    void __cxa_atexit() {
    }
}

This overrides the usual symbol, and doesn't perform the step that generates _on_exit_args_instance. Overriding the __register_exitproc did not work.

I've never really delved this much into GCC before, so i'm unable to conclude whether my solution is correct...

I feel that having different solutions is a bad thing, and we should endeavour to find a solution that covers the general case.

dpgeorge commented 8 years ago

I feel that having different solutions is a bad thing, and we should endeavour to find a solution that covers the general case.

I don't think that's possible, due to these functions (__cxa_atext etc) being internal and subject to change between compiler/library versions.

The alternative is to not use destructors, or have separate classes for objects that are statically allocated.

jamesadevine commented 8 years ago

@dpgeorge is a simpler option to just override __aeabi_atexit?

I've been doing some investigation, and I noticed that the LPC platforms on mbed override this in their startup code.

I've replicated this in a main.cpp, and it removes the 264 byte RAM overhead:

extern "C"{
    int __aeabi_atexit(void *object, void (*destructor)(void *), void *dso_handle) {
        return 0;
    }
}

Does this work with your configuration?

dpgeorge commented 8 years ago

Does this work with your configuration?

Nope.

jamesadevine commented 8 years ago

@dpgeorge Can you verify that your .bss section actually changes size based on the application of your symbol definition?

Even though the symbol disappears from my .map file, the overall bss for the hex does not change.

Here is my sample program:

MicroBit uBit;

extern "C"{
    int __aeabi_atexit(void *object, void (*destructor)(void *), void *dso_handle) {
        return 0;
    }

    void __cxa_atexit() {
    }

    void __register_exitproc() {
    }
}

int main()
{
    // Initialise the micro:bit runtime.
    uBit.init();

    // Insert your code here!
    uBit.display.scroll("HELLO WORLD! :)");

    // If main exits, there may still be other fibers running or registered event handlers etc.
    // Simply release this fiber, which will mean we enter the scheduler. Worse case, we then
    // sit in the idle task forever, in a power efficient sleep.
    release_fiber();
}

With the extern'd symbols, my arm-none-eabi-size gives:

   text    data     bss     dec     hex filename
  73604     172    2148   75924   12894 build/bbc-microbit-classic-gcc/source/microbit-samples

And without:

   text    data     bss     dec     hex filename
  73636     172    2148   75956   128b4 build/bbc-microbit-classic-gcc/source/microbit-samples
dpgeorge commented 8 years ago

Using the same example program that you have (from a fresh clone of microbit-samples) I get:

With dummy functions:
   text    data     bss     dec     hex filename
  71620     172    2144   73936   120d0 build/bbc-microbit-classic-gcc/source/microbit-samples

Without dummy functions:
   text    data     bss     dec     hex filename
  72112     172    2552   74836   12454 build/bbc-microbit-classic-gcc/source/microbit-samples

So, yes, there is a difference. A saving of nearly 500 bytes code and 308 bytes RAM.

gcc version: arm-none-eabi-gcc (Arch Repository) 5.3.0

jamesadevine commented 8 years ago

@dpgeorge Thanks, sorry to come across pedantic. @jaustin and ARM have filed a bug with newlib, and I am just trying to get as much info as possible! :smile:

jaustin commented 8 years ago

@jamesadevine @dpgeorge I wonder if the 'Arch Repository' part is the key here? Is your newlib configured differently there?

I'm getting my toolchains from https://launchpad.net/gcc-arm-embedded - James, what about you?

jamesadevine commented 8 years ago

@jaustin same!

dpgeorge commented 8 years ago

The newlib package is automatically installed for me. Pacman reports: local/arm-none-eabi-newlib 2.4.0-1

jamesadevine commented 8 years ago

@dpgeorge Here's a reply, courtesy of @jaustin

Sorry for the delay, this ticket got lost somewhere underneath the big pile of "other stuff". I have been looking again at it today and I had the same issue you did with overwriting '__register_exitproc'. However, overwriting __on_exit_args did work save half the bss on a small example.

A mere global 'char __on_exit_args;' will do.

Let me know if you would prefer some other approach.

Cheers, Andre