lancaster-university / microbit-dal

http://lancaster-university.github.io/microbit-docs
Other
254 stars 130 forks source link

Bluetooth pairing causes crash (OOM?) with 2.1.0-rc1 #366

Closed smartyw closed 6 years ago

smartyw commented 6 years ago

I've had reports from two different users that they're getting what looks like Out of Memory conditions arising from tiny programs involving Bluetooth. In the first case, the user was simply including the accelerometer service. On pairing or on connecting, his micro:bit would display random LEDs, probably indicating a crash and any connection established would drop.

I investigated and recreated the issue when using the latest microbit-samples which brings with it version 2.1.0-rc1 of microbit-dal. When I built the same source code with 2.0.0-rc9 everything worked fine.

It looks like something is very broken with Bluetooth in 2.1.0-rc1. Think this needs looking at urgently, please.

Recreate the issue with this code, pair and then try connecting from nRF Connect.

#include "MicroBit.h"
MicroBit uBit;

void onConnected(MicroBitEvent)
{
    uBit.display.print("C");
}

void onDisconnected(MicroBitEvent)
{
    uBit.display.print("D");
}

int main() {
    uBit.init();
    //create_fiber(reset); // create fiber and schedule it.

    uBit.display.scroll("Start");

    uBit.messageBus.listen(MICROBIT_ID_BLE, MICROBIT_BLE_EVT_CONNECTED, onConnected);
    uBit.messageBus.listen(MICROBIT_ID_BLE, MICROBIT_BLE_EVT_DISCONNECTED, onDisconnected);

    new MicroBitAccelerometerService(*uBit.ble, uBit.accelerometer);

    release_fiber(); // "release the fibers!!"
}
{
    "microbit-dal":{
        "bluetooth":{
            "enabled": 1,
            "pairing_mode": 1,
            "private_addressing": 0,
            "open": 0,
            "whitelist": 1,
            "advertising_timeout": 0,
            "tx_power": 0,
            "dfu_service": 0,
            "event_service": 0,
            "device_info_service": 1
        },
        "gatt_table_size": "0x700",
        "nested_heap_proportion": 0.50
    }
}
DavidWhaleMEF commented 6 years ago

@jaustin @finneyj ^^^

smartyw commented 6 years ago

Just tested with simpler config.json and got the same result. As soon as I connect to the device (running the CPP above), it crashes:

{
    "microbit-dal": {
        "bluetooth": {
            "enabled": 1,
            "pairing_mode": 1,
            "private_addressing": 0,
            "open": 0,
            "whitelist": 1,
            "advertising_timeout": 0,
            "tx_power": 0,
            "dfu_service": 0,
            "event_service": 0,
            "device_info_service": 1
        },
        "gatt_table_size": "0x700"
    }
}
martinwork commented 6 years ago

As soon as the BLE is connected, the accelerometer service sends 6 notifications in rapid succession then very soon micro:bit hangs. The accelerometer appears to be ignoring the period, so the accelerometer service receives update events too often (every 4ms?). I think this may be caused by the way the MMA8653 constructor initialises int1.

smartyw commented 6 years ago

It's sending notifications before they've even been enabled by the client? That's an issue too, then. I just tried to connect with nRF Connect. I didn't get far enough to enable notifications.

jaustin commented 6 years ago

Does this happen on the dal-integration branch or just 2.1.1-rc1? There are important fixes on that branch over rc1 (and the rc2 tag, also has some fixes)

J

On Sun, 5 Aug 2018, 14:13 Martin Woolley, notifications@github.com wrote:

It's sending notifications before they've even been enabled by the client? That's an issue too, then. I just tried to connect with nRF Connect. I didn't get far enough to enable notifications.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lancaster-university/microbit-dal/issues/366#issuecomment-410519478, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI-qXL95mxoThKKtOeMds_Andq-SmCLks5uNu-OgaJpZM4VvE1w .

smartyw commented 6 years ago

It happens with a fresh set up of microbit-samples which I think pulls down dal 2.1.0-rc1. As such this impacts anyone setting up an offline development environment, per the instructions at the Lancaster University web site documentation.

Not tested other versions. No more time for today.

martinwork commented 6 years ago

dal-integration branch has a fix of *1000 on the sample period in MMA8653::configure().

martinwork commented 6 years ago

Starting with microbit-samples, yt update downloads microbit rc2 and dal rc2, but the dal module.json says rc1. I see rc1 in a downloaded zip of release rc2 too.

smartyw commented 6 years ago

Just tried using the LED service and it was fine. Looks like it is just accelerometer related (which is a relief!).

@martinwork I get that there's a bug regarding the frequency with which accelerometer notifications are being generated, but is it actually sending notifications regardless of the state of the client characteristic configuration descriptor or just going mad sampling the accelerometer too fast? That would be a bug as well.

martinwork commented 6 years ago

The service does call notify whenever an accelerometer update event is sent and ble.getGapState().connected is true, which isn't new. I think soft device checks the CCCD.

smartyw commented 6 years ago

Ok, c'est bon. Thanks.

martinwork commented 6 years ago

The rc2 compass is also affected by the missing *1000 and a default 50Hz sampling rate.

teamshortcut commented 6 years ago

Worth pointing out that I have been having the exact same issues. I will test later but using 2.0.0-rc9 it apparently works, but on 2.1.0-rc1 it definitely appears to be broken.

finneyj commented 6 years ago

Thanks to @smartyw, @martinwork and @teamshortcut for raising this issues and the detailed comments! Excellent poking from @DavidWhaleMEF too (I think his fingers are getting pointier by the day).

A few points here I think:

1) 2.1.0-rc1 seems to have leaked out onto the mainline microbit-samples. This wasn't intentional, and is my bad. Sorry about that folks. microbit-samples is configured to use the latest version tagged on the microbit and microbit-dal repos... so it locked onto 2.1.0-rc1 even though that's sat on the dal-integration branch and hasn't been merged into master yet.

2) This could be related to the incorrectly configured accelerometer that we know is present in 2.1.0-rc1, as @jaustin noted. I'm surprised it manifest as an out of memory error though (although perhaps the event queue is growing on the BLE accelerometer service as a result of the increased update frequency).

3) There were also some changes to the BLE code in 2.1.0-rc1 to support partial re-flashing of the micro:bit over BLE. It could be that some change in configuration/memory layout here has caused a side effect. With Bluetooth enabled the micro:bit is very close to running out of memory (although it really shouldn't be running out in this examples)

I'll block out some time today to:

I'll report back what I find here...

finneyj commented 6 years ago

FYI, microbit-samples now referring to latest v2.0.0 repositories. I've tested this on my machine, but if anyone would like to verify that this is indeed reverted, please do confirm here.

ghost commented 6 years ago

Great, thank you. Kudos to @teamshortcut for flagging the problem in the first place. Good luck with the A'Level! We need people like you :-)

finneyj commented 6 years ago

Yes indeed!

Hope to see you at an open day here at Lancaster University sometime soon @teamshortcut ;-)

p.s. FYI - I've also reproduced the fail case here too.

finneyj commented 6 years ago

OK... So i've rebuilt the code against the HEAD revision of the dal-integration branch (that's the release candidate branch for 2.1.0), and the code seems to run ok against that. no crashes, and I get accelerometer data that looks sensible via nrf connect. Connect/Disconnect status also seem ok.

We can therefore (with a fair amount of certainty) put this down to one of the patches between the v2.1.0-rc2 tag and the HEAD of dal-integration... https://github.com/lancaster-university/microbit-dal/commits/dal-integration

There are a couple of likely candidates in there - @Taylor-Woodcock 's fix to the accelerometer period (likely), or @microbit-sam 's update to the GATT table size for his partial flashing work (also likely!).

Either way, looks like this bug is already fixed in the HEAD revision dal-integration... Again, if anyone would like to verify this, please go do and report back here...

teamshortcut commented 6 years ago

Thanks @bluetooth-mdw @finneyj ! Using 2.0.0-rc9 it does work on my machine; I've never been quite so happy to see some coordinates!

finneyj commented 6 years ago

No worries @teamshortcut. Heh. One day you'll be even more excited about getting a blinking LED. ;)