lancaster-university / microbit-dal

http://lancaster-university.github.io/microbit-docs
Other
254 stars 130 forks source link

printf seems to malloc 428 + 1024 bytes and causes out-of-memory panic #387

Open martinwork opened 5 years ago

martinwork commented 5 years ago

https://github.com/lancaster-university/microbit-dal/blob/master/source/bluetooth/MicroBitDFUService.cpp#L105

printf(" ACTIVATING BOOTLOADER.\n");

This causes an out-of-memory panic if doing DFU with debug enabled.

I added a SERIAL_DEBUG->printf each side of it - only the first gets sent before the panic.

finneyj commented 5 years ago

ooh, yes. You don' t want that flavour of printf when BLE is enabled. It does spin up large buffer. We should change as you suggest

martinwork commented 5 years ago

I just realised this confirms my suspicion that the out of memory problems have appeared because some routines request memory that they can do without either by working harder or failing quietly.

If I modify the malloc to return NULL when size = 1024, the message still gets printed. If I also reject size = 428, the message no longer appears, but it doesn't crash.

Would it be possible to globally override malloc as now but return NULL and then override it again with a panic for the DAL routines?

finneyj commented 5 years ago

@martinwork thanks for all the information.

I don't think your analysis is quite right though. I'm pretty sure (please prove me wrong), that the old libc malloc allocator would have simply given you that 1024 bytes. Even though it was in stack space, and inherently unsafe to do so...

I understand what you're trying to do here, but this really doesn't feel like right fix to me. To my mind, the micro:bit is for kids and teachers, and they need determinism. If there are some routes that lead to an OOM that isn't trapped (we have no guarantee that third party code would elegantly handle an OOM), we'll get hard crash... and a panic is far more preferable. At least there's some feedback.

I really think these OOM issues will be restricted to BLE scenarios. To me, the real fix is to see what we can do to reduce the RAM usage from SoftDevice, the Nordic SDK and mbed APIs... Don't you think?

martinwork commented 5 years ago

I ran into the problem of the heap and stack colliding some while ago. If I remember rightly, the heap would creep into the stack space, but wouldn't actually overwrite the stack. The stack would extend into the heap though, leading to chaos. Maybe it was the other way round? I didn't find out what happens when the stack fills the stack space.

The arithmetic of space available doesn't seem to explain why it now runs out of memory so easily, so I was thinking about possible explanations. One is that we were lucky before that some large allocations were failing but the SDKs were able to cope. The printf behaviour was evidence of this.

It would be a much better solution to modify the SDKs. Not imagining that this was on the cards, I was thinking we might try bringing the SDK allocations under control to stop them creeping into stack space but present them with a similarly limited heap space, so they worked as before.

One possible temporary quick fix now would be to reinstate the old working heap code, if only for BLE projects. Or has something else changed that demanded the new heap setup?

finneyj commented 5 years ago

@martinwork

Indeed, the libc heap would continue allocating all the way to the top of memory. It would overwrite the stack if it got that far, but more commonly the stack would grow and clobber something on the heap first (most likely in an interrupt). If the stack fills the stack space it too would keep on growing. not pretty... There's no support for memory protection on the Cortex M0 AFAIK.

I disagree about the explanation though. I think the arithmetic does work out. Previously, you would never get a null return from a malloc. You'd get stack memory... (unless you actually used all that up, and somehow managed to keep running). The old heap code was not working... it was inherently unsafe. We just didn't know it...

Unless I'm misunderstanding, isn't the approach you're suggesting for the SDKs the current behaviour? i.e. they are prevented from creeping into stack space - that's why they now run out of memory.

We can change the allocated stack space - that's just a config option: https://lancaster-university.github.io/microbit-docs/advanced/#compile-time-options-with-yotta

If you want it to behave as before, just set the stack size to zero...

martinwork commented 5 years ago

When the old heap code calls native_malloc inside microbit_malloc, is that the libc heap? I thought native_malloc returns NULL when the heap would actually overwrite the current stack and the bad magic happens if the stack then grows.

When I try the sample below with the dal-integration-3 release and debugs enabled, the size of the onthestack array influences how many 100 byte NATIVE ALLOCATED mallocs succeed before the panic, according to the serial debug.

About the arithmetic... I don't understand why the new regime runs out of memory so easily, given that there is approximately the same amount of memory available if the stack is squeezed down.

I have to go now, but I just tried this with v2.1.0 and the mallocs in the for loop did not generate and serial debug and microbit hangs rather than panics.

`#include "MicroBit.h"

MicroBit uBit;

int main() { uBit.init();

if CONFIG_ENABLED(MICROBIT_DBG)

if(SERIAL_DEBUG) SERIAL_DEBUG->printf("main\n");

endif

for ( int i = 0; i < 100; i++) { char onthestack[ 1]; onthestack[ sizeof( onthestack) - 1] = 'A';

if CONFIG_ENABLED(MICROBIT_DBG)

if(SERIAL_DEBUG) SERIAL_DEBUG->printf("i = %d sp = %p onthestack = %p A = %c\n", i, GET_SP(), onthestack, onthestack[ sizeof( onthestack) - 1]);

endif

uBit.display.printChar( '0' + i % 10);
uBit.sleep(100);
void *p = malloc( 100);

}

release_fiber(); } `

finneyj commented 5 years ago

thanks @martinwork. That is interesting. I'll try to reproduce.

the sleep() might a feature though, as descheduling will cause the stack to be pages out, increasing heap memory use... wait_ms() would be a better test here.

how did you configure the stack size for the 2.1.0 test?

finneyj commented 5 years ago

@martinwork I think your test needs a little tweak. Looks like the C compiler is optimizing out your malloc call because the pointer is never used. If I run this version with an out-of-the-box BLE enabled build, I get 19 iterations then an 020 panic:

#include "MicroBit.h"

MicroBit uBit;

int main()
{
    uBit.init();

#if CONFIG_ENABLED(MICROBIT_DBG)
    if(SERIAL_DEBUG) SERIAL_DEBUG->printf("main\n");
#endif

    for ( int i = 0; i < 100; i++)
    {
        char onthestack[ 1];
        onthestack[ sizeof( onthestack) - 1] = 'A';

#if CONFIG_ENABLED(MICROBIT_DBG)
        if(SERIAL_DEBUG) SERIAL_DEBUG->printf("i = %d sp = %p onthestack = %p A = %c\n", i, GET_SP(), onthestack, onthestack[ sizeof( onthestack) - 1]);
#endif
        uBit.display.printChar( '0' + i % 10);
        wait_ms(100);
        void *p = malloc(100);
#if CONFIG_ENABLED(MICROBIT_DBG)
        if(SERIAL_DEBUG) SERIAL_DEBUG->printf("p = %p\n", p);
#endif
    }

    release_fiber();
}
finneyj commented 5 years ago

p.s. you also need to have DEBUG enabled in that test.

martinwork commented 5 years ago

The config.json I used is below. I tried your modified code... and now I see ALLOCATED with v2.1.0. That's a relief! Seems strange that the malloc call isn't optimised out with the older release. I thought I checked recently that the targets were the same.

With the old heap, I get 25 blocks with onthestack[1] and 15 with onthestack[501], with the last block just short of onthestack. With v2.1.0 and the same stack size, I get 10 blocks and 5 blocks. In both cases an extra 500 bytes has been allocated somewhere - see below.

Looking at the addresses, it certainly seems like native_malloc is avoiding overwriting the stack and returning NULL, doesn't it?

The version difference appears to be 1500 bytes, about 250 bytes more than I expected given the initial creep into the stack space is a couple of hundred bytes or so.

Maybe my working out is wrong but, based on the addresses of the heaps and first pointers in the for loop, v2.1.0 seems to use about 390 more bytes of heap to start up, compared to dal-integration-3.

Within MakeCode, I think the PF service is allocating 70 bytes or so that it doesn't use.

MakeCode has reduced the stack size by 768, which based on the above could be enough, but apparently isn't.

Unrelated, maybe... With both versions there's an early heap allocation that depends on the size of the onthestack array. This seems to be how adding 500 bytes to onthestack also eats 500 bytes of heap. This is allocated during the first sleep() in MicroBit::Init and turns out to be in verify_stack_size(). This is a new consideration for me - avoid using stack space in main?

{ "microbit-dal":{ "bluetooth":{ "enabled": 1, "pairing_mode": 1, "private_addressing": 0, "open": 0, "whitelist": 1, "advertising_timeout": 0, "advertising_interval": 50, "tx_power": 0, "dfu_service": 1, "event_service": 1, "device_info_service": 1, "security_level": "SECURITY_MODE_ENCRYPTION_NO_MITM" }, "gatt_table_size": "0x700", "debug": 1, "heap_debug": 1 } }