h2zero / NimBLE-Arduino

A fork of the NimBLE library structured for compilation with Arduino, for use with ESP32, nRF5x.
https://h2zero.github.io/NimBLE-Arduino/
Apache License 2.0
670 stars 138 forks source link

Fix NimBLEDevice::init() deadlock problem #584

Open CW-B-W opened 10 months ago

CW-B-W commented 10 months ago

NimBLEDevice::host_task() has priority (configMAX_PRIORITIES - 4), which is defined at https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/nimble/porting/npl/freertos/src/nimble_port_freertos.c#L60-L61

When the priority of NimBLEDevice::init() is greater than (configMAX_PRIORITIES - 4), NimBLEDevice::init() will block NimBLEDevice::host_task() but still wait for NimBLEDevice::host_task() infinitely (waiting for m_synced to be set), resulting in the deadlock between NimBLEDevice::init() and NimBLEDevice::host_task().

This is caused by this while loop https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/NimBLEDevice.cpp#L909-L911 When the priority of NimBLEDevice::init() is greater than NimBLEDevice::host_task(), it cannot yield and let NimBLEDevice::host_task() be executed.

Using a binary semaphore can resolve this problem.

h2zero commented 10 months ago

I've never heard of or experienced this issue before, do you have any examples where this can be reproduced?

CW-B-W commented 10 months ago

Hello @h2zero,

Here is the minimum reproducible example (it tested it with https://github.com/h2zero/NimBLE-Arduino/commit/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5),
In this example, it will never print Initialized.
It got stuck inside NimBLEDevice::init(), and the NimBLEDevice was never sucessfully initialized.
And after a few seconds the system will be reset by watchdog.

#include <NimBLEDevice.h>

void InitNimbleTask(void* args)
{
    Serial.println("Initializing NimBLEDevice");
    NimBLEDevice::init("NimBLE-Arduino");
    Serial.println("Initialized");

    NimBLEDevice::startAdvertising();

    vTaskDelete(NULL);
}

void setup() {
    Serial.begin(115200);

    // host_task has priority (configMAX_PRIORITIES - 4)
    const uint32_t priority = (configMAX_PRIORITIES - 3); 

    // host_task is pinned at core 0
    xTaskCreatePinnedToCore(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL, 0); 
}

void loop() {

}

As stated in the PR, I think it's because host_task has priority (configMAX_PRIORITIES - 4)
https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/nimble/porting/npl/freertos/src/nimble_port_freertos.c#L60-L61
When NimBLEDevice::init() has priority greater than (configMAX_PRIORITIES - 4), the host_task starves.

And if we set the priority of InitNimbleTask to (configMAX_PRIORITIES - 4) or lower, it works again.


Some even trickier situation,

  1. If we add a print in the problematic loop
    https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/NimBLEDevice.cpp#L909-L911

    while(!m_synced){
        Serial.println("YIELD");
        taskYIELD();
    }

    It works again!
    This is because Serial.println() make the NimBLEDevice::init() task go to block state, which gives host_task some time to be executed.

  2. If we don't pin InitNimbleTask at core 0 If we created the task with xTaskCreate instead of xTaskCreatePinnedToCore, it magically works.

    const uint32_t priority = (configMAX_PRIORITIES - 3);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);

    But if we change the priority to (configMAX_PRIORITIES - 2) or (configMAX_PRIORITIES - 1), it gets stuck again.

    const uint32_t priority = (configMAX_PRIORITIES - 2);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);

    I don't really know the reason behind, but I guess maybe it's because ESP32 has 2 cores, so the host_task may not be starved.

h2zero commented 10 months ago

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

CW-B-W commented 10 months ago

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

Actually, I don't really need the priority of application be higher, I just accidentally found this problem when I set different task priority for my application. I was porting this project to an MCU (other than Arduino, ESP32), and it uses polling to read user command from UART, one of the UART command is to turn BLE on. To make the UART command handler be as responsive as possible, I made its priority the highest configMAX_PRIORITIES - 1, then I found when I launch NimBLEDevice::init() inside this command handler, which has the highest priority, it gets stuck.

h2zero commented 7 months ago

@CW-B-W That is a great explanation, thank you for this. I have tested your example and propose a different fix.

Instead of the semaphore, could you just replace the taskYIELD() in the while loop with ble_npl_time_delay(1) instead, that should resolve the issue as well without the extra memory requirements.