lancaster-university / codal-microbit-v2

CODAL target for the micro:bit v2.x series of devices
MIT License
41 stars 50 forks source link

Memory leak when SerialStreamer is used #340

Open microbit-carlos opened 1 year ago

microbit-carlos commented 1 year ago

This programme runs for about 1 min 30 s and then we get a MICROBIT_HEAP_ERROR 030 error.

#include "MicroBit.h"
#include "SerialStreamer.h"

MicroBit uBit;

int main() {
    uBit.init();

    SerialStreamer *streamer = new SerialStreamer(uBit.audio.processor->output, SERIAL_STREAM_MODE_BINARY);
    uBit.audio.requestActivation();
    uBit.audio.activateMic();

    while (true) {
        uBit.sleep(1000);
    }
}
martinwork commented 9 months ago

Most often the panic is in isReadOnlyInline() from incr() when refCount == 1, implying a buffer being copied has been destroyed. I have also seen a panic from decr(). https://github.com/lancaster-university/codal-core/blob/master/source/types/RefCounted.cpp#L52

It doesn’t seem to be heap corruption, but that a RefCounted buffer has been released too many times and destroyed.

I have seen a panic in CodalHeapAllocator when a ref counted buffer was being freed. https://github.com/lancaster-university/codal-core/blob/master/source/core/CodalHeapAllocator.cpp#L378

I have seen SerialStreamer::lastBuffer crash when assigning a new pull because it’s previous ref counted object has been freed. lastbuffer

I think one problem is that the buffer operations are not interrupt safe.

I have analysed one example, built with GNU 12.3.1, involving DataStream::nextBuffer, which seems to show DataStream::pull() interrupted at the wrong instant, for a visit to DataStream::pullRequest(), which leads to a double release of DataStream::nextBuffer. https://github.com/lancaster-university/codal-core/blob/master/source/streams/DataStream.cpp#L160 https://github.com/lancaster-university/codal-core/blob/master/source/streams/DataStream.cpp#L193

A panic occurs as DataStream::pull() tries to return tmp, because its ptr has been released and destroyed. tmp

Line numbers above may be different because of added trace code: https://github.com/martinwork/microbit-v2-samples/tree/test-serialstreamer https://github.com/martinwork/codal-core/tree/test-serialstreamer https://github.com/martinwork/codal-microbit-v2/tree/test-refcounted

Without the trace code additions, it can take a lot longer to panic, but it does eventually.

See the log below. As I understand it...

Log of RefCounted operations ``` Key: iBufBuf incr in the constructor dBuf=/iBuf= decr/incr in operator = dBuf~ decr in destructor some lines linking addresses to variables have a dummy NULL ptr. 0x20006EF0,1,0x2001FD58::iBufBuf refCount == 1 ``` |sequence|ptr|count|object|fn| | ------------- | --- | ------------- | ------------- | ------------- | |439185|0x00000000|0|0x20008A18|lastBuffer SerialStreamer::pullRequest| |439186|0x00000000|0|**0x200024D4**|nextBuffer DataStream::pull| |// tmp is assigned| |439187|**0x20006EF0**|5|0x2001FD30|iBufBuf| |439188|0x00000000|0|0x2001FD30|tmp DataStream::pull| |// nextBuffer enters operator =, then decr| |439189|**0x20006EF0**|7|**0x200024D4**|dBuf=| |// note: no iBuf=| |// interrupt?| |439190|0x200092E4|3|0x2001FC74|iBufBuf| |// another variable using 6EF0 is reassigned| |// note the refCount is unchanged from above| |439191|**0x20006EF0**|7|0x200024B8|dBuf=| |439192|0x200092E4|5|0x200024B8|iBuf=| |439193|0x00000000|0|**0x200024D4**|nextBuffer DataStream::pullRequest| |439194|0x200092E4|7|0x2001FC48|iBufBuf| |// nextBuffer is assigned, calling decr on 6EF0 again| |439195|**0x20006EF0**|5|**0x200024D4**|dBuf=| |439196|0x200092E4|9|**0x200024D4**|iBuf=| |// now the 6EF0 refCount is 3| |// other irrelevant ops| |439197|0x200092E4|11|0x2001FC48|dBuf~| |439198|0x200092E4|9|0x2001FC74|dBuf~| |439199|0x20008C78|3|0x2001FC74|init| |439200|0x200092E4|7|0x200025DC|dBuf=| |439201|0x20008C78|3|0x200025DC|iBuf=| |439202|0x20008C78|5|0x2001FC74|dBuf~| |// return from interrupt?| |// complete the original decr() and destroy| |439203|**0x20006EF0**|1|**0x200024D4**|destroy|
martinwork commented 9 months ago

@JohnVidler I didn't have much luck catching it in the debugger, but I added some DMESG to the relevant functions and captured repeated traces like below.

SerialStreamer::pullRequest seems indirectly to pull the nextBuffer from the same DataStream (at 0x200024C8) that demux writes to,

NRF52ADCChannel::demux calls the ADC channel's non-blocking output DataStream::pullRequest to update its nextBuffer.

NRF52ADCChannel::demux 0x200024B0 output 0x200024C8
DataStream::pullRequest 0x200024C8 DataSink 0x20006118 DataSource 0x200024B0 blocking 0
DataStream::pullRequest 0x200024C8 setting nextBuffer
NRF52ADCChannel::demux EXIT

Next StreamSplitter::pullRequest routes down through SplitterChannel and StreamNormalizer to SerialStreamer::pullRequest.

StreamSplitter::pullRequest
SplitterChannel::pullRequest 0x2000616C output 0x2000618C
StreamNormalizer::pullRequest output 0x200061B0
DataStream::pullRequest 0x200061B0 DataSink 0x20006FC8 DataSource 0x20006190 blocking 1
SerialStreamer::pullRequest

SerialStreamer::pullRequest calls pull on StreamNormalizer::output (a blocking DataStream) which routes via StreamNormalizer, SplitterChannel, and StreamSplitter::getBuffer to the ADC channel's output DataStream:pull, which returns the nextBuffer.

DataStream::pull 0x200061B0 DataSink 0x20006FC8 DataSource 0x20006190 blocking 1
DataStream::pull 0x200061B0 calling upStream->pull()
StreamNormalizer::pull upstream 0x2000616C
SplitterChannel::pull 0x2000616C
StreamSplitter::getBuffer upstream 0x200024C8
DataStream::pull 0x200024C8 DataSink 0x20006118 DataSource 0x200024B0 blocking 0
DataStream::pull 0x200024C8 returning nextBuffer
SplitterChannel::pullRequest 0x200061D8 output 0x00000000
StreamSplitter::pullRequest EXIT