sparkfun / Arduino_Apollo3

Arduino core to support the Apollo3 microcontroller from Ambiq Micro
83 stars 37 forks source link

BLE heartbeat causes memory leaks and crash #314

Closed paulvha closed 3 years ago

paulvha commented 3 years ago

When ArduinoBLE is used it will not initialize using CordioBLE.cpp, but has it's own (overlapping) layers. That is because ArduinoBLE can either communicate directly with UART connection to the HCI-core OR use Cordio. As a result, it will use only part of the Cordio/dual-chip code to communicate to the BLE-core layer in the Apollo3. It will NOT use the other CORDIO code. The problem is that the Apollo3 HCITansport layer assumes that IT is used.

What is happening?

From the ArduinoBLE, HCICordioTransportClass::begin() which on a certain moment will call getDriver().start_reset_sequence(). Which is CordioHCIDriver::start_reset_sequence(), which in turn calls for HciResetCmd()in hci_cmd.c (part of CORDIO dual-chip). BUT hci_cmd.c is NOT initialized as CordioBLE.cpp is NOT used. It will create a reset command which is sent with hciCmdSend() (part of CORDIO dual-chip). For code simplicity, only ONE command is allowed to be pending and this reset is the first and thus sent. However, the fact that BLE-core has completed the command is NEVER recorded as the call-back calls (part of CORDIO dual-chip) to capture that are NOT included in the ArduinoBLE communication setup. So ONLY this first command (reset) is sent, using the cmdQueue, nothing else after is allowed to be sent as it is never captured that reset-command completed. The 'hciCmdCb.numCmdPkts' is not returned to 1. ArduinoBLE does not use this cmdQueue for any of its communications to the HCI-core layer, it is using the g_sWriteQueue.

Now the fun starts.. (not really) Once HciDrvHandler()in hci_drv_apollo3.c has been called with something to read it will trigger the heartbeat or when the heartbeat timer expires. This heartbeat is to check whether the HCI core layer on the APollo3 is still responsive. HOW? well it calls to read the harmless HciReadLocalVerInfoCmd() which is part of (the NOT intialized) hci_cmd.c. Here it requests 16 bytes with hciCmdAlloc(), forms the command, calls hciCmdSend() and it is added to the cmdQueue. BUT it is NEVER (EVER !!) released from this queue as it is still waiting on the previous command (in our case reset) returning. Every 10 seconds a new 16 bytes buffer is requested from the pool and added to the never serviced cmdQueue.

As every 10 seconds 16 bytes are removed (and not returned) from the central buffer space, it is only a matter of time before hciCmdAlloc() is not able to return a pointer to buffer bytes for any request. To make things worse.. ArduinoBLE it is NOT checking on every place whether a valid pointer has been returned before using and a crash (blinking led) is happening.

What to do?

The Apollo3 BLE heartbeat implementation is NOT necessary. Given the fact that heartbeat is NOW never sent (stuck in the cmdQueue) when using ArduinoBLE AND the program works well until it crashes as there is no more buffer to get.... it is NOT needed...just remove it.

BUT it requires a recompile of the Mbed-os library and in hci_drv_apollo3.c change the '#define ENABLE_BLE_HEARTBEAT 1' to a '#define ENABLE_BLE_HEARTBEAT 0' I have tried it and it works well.

regards, Paul

Wenn0101 commented 3 years ago

@paulvha Thanks for the very detail report and suggested solution!

I am looking at this now and evaluation if the BLE heartbeat could be useful or if the solution is really as simple as turning it off!

paulvha commented 3 years ago

Thanks

It is really that simple.... my sketch/application that is just sending every 60 seconds some data over notify and was crashing constant. After making the change.. it worked for 48 hours continuous without a problem. I had put traces on WsfBufAlloc() and WsfBufFree() and you could just see it happen that buffer was allocated, but not released.

BTW.. Just in case Sparkfun is interested.. I have taken the "older" Exactle (Cordio these days) and ported that to version 1.2.1 It can send and receive data from a sketch and is working stable. I have tested on ATP board, but I know that it works on OLA as well. (as 2 people in the US reached out) I have ordered an OLA board to integrate it myself.

regards, Paul


Van: Wenn0101 notifications@github.com Verzonden: donderdag 7 januari 2021 18:43 Aan: sparkfun/Arduino_Apollo3 Arduino_Apollo3@noreply.github.com CC: paulvha paulvha@outlook.com; Mention mention@noreply.github.com Onderwerp: Re: [sparkfun/Arduino_Apollo3] BLE heartbeat causes memory leaks and crash (#314)

@paulvhahttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpaulvha&data=04%7C01%7C%7Cd2b9fc1b2f424330ceba08d8b333d06a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456382404499082%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2tHTeMuEDW1LIva7q8N99qUQLL6GgCE9e%2BAVlah1uNI%3D&reserved=0 Thanks for the very detail report and suggested solution!

I am looking at this now and evaluation if the BLE heartbeat could be useful or if the solution is really as simple as turning it off!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsparkfun%2FArduino_Apollo3%2Fissues%2F314%23issuecomment-756270499&data=04%7C01%7C%7Cd2b9fc1b2f424330ceba08d8b333d06a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456382404509083%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rec%2Bgifx7t8nNPhiV%2BEIlCVmOpK1tOeO1sbxSHHVePI%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAD2GBPCH4AOCXMY37F2IUFLSYXXF7ANCNFSM4VK2VLJA&data=04%7C01%7C%7Cd2b9fc1b2f424330ceba08d8b333d06a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456382404509083%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IFFu3wczaqrU0UY%2FSqTNabO4JMgCntV8xkbTgaPQ5bY%3D&reserved=0.

Wenn0101 commented 3 years ago

This change was made in the mbed core, and was built into the latest release.

Paul, I would be interested in BLE work that you have done on the v1, core. Send me a link to the work or feel free to submit a PR to the v1 branch on this repo, and we can determine if we want to make a patch release of the V1 core that includes this functionality.