nkolban / esp32-snippets

Sample ESP32 snippets and code fragments
https://leanpub.com/kolban-ESP32
Apache License 2.0
2.36k stars 710 forks source link

BLE semaphores - a broader discussion #681

Open mws-rmain opened 6 years ago

mws-rmain commented 6 years ago

I have some issues with semaphore use in the BLE classes, and thought I'd open a discussion here (or a Neil might say flesh out 'the story'). I fully expect that many of the defined semaphores are required for reasons that I've not thought of or encountered, and if so, I'd like to expand my understanding.

Broadly speaking, it appears that calls to functions of the esp-idf Bluetooth stack create internal 'events' that are posted to handler queues to be serviced by internal tasks. I can see the need for semaphore use to synchronize 'app code' with the underlying BT stack tasks (for instance to ensure a 'service' has actually been created, before allowing the 'application code' to continue with adding characteristics to it, etc.).

However, using a semaphore in BLECharacteristic to 'stall' application code that calls 'indicate()' until an ESP_GATTS_CONF_EVT shows a connected client confirmed the change was successfully received (via 'm_semaphoreConfEvt.wait()' seems pointless to me, unless you plan on spawning a new thread to update attribute changes for each & every attribute (?). If the indicate was not yet acknowledged, and a subsequent indicate was called on the same attribute, calling 'm_semaphoreConfEvt.wait()' just before 'm_semaphoreConfEvt.take() would stall the calling thread just as easily, and this way would allow a single 'update thread' to update multiple changed attributes (possibly on a timer), with a simple loop:

while(1) {
  if (attribute_a_changed == true) {
    pAttribA->setValue( (uint8_t *)&attribA, sizeof(attribA) );
    pAttribA->indicate();  // send indicate to any clients that have enabled it
    pAttribA->notify();   // send notify to any clients that have enabled it
  }
  if (attribute_b_changed == true) {
    pAttribB->setValue( (uint8_t *)&attribB, sizeof(attribB) );
    pAttribB->indicate();  // send indicate to any clients that have enabled it
    pAttribB->notify();   // send notify to any clients that have enabled it
  }
  ...
  vTaskDelay( pdMS_TO_TICKS(200) );
}

I'm aware that Neil raised the issue of ESP_GATTS_CONF_EVT being raised on notify() #749, and the response that "it's a feature of our system". Regardless, since BLE defines notify() as equivalent to UDP (vs TCP), I'd make the case that there is certainly NO valid reason for a semaphore against it.

Note that @chegewara is currently working on multi-device connections, which will make all of this semaphore handling much more complicated, when it IS legitimate, so taking some time to do some thoughtful cleanup now would be very helpful.

chegewara commented 6 years ago

Usage of semaphores for sure needs to be reviewed, because today for example i found small bug just by accident. Her for example semaphore is taken, but when for some reason returned value is not ESP_OK then it is never released: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/BLECharacteristic.cpp#L561-L570

Another very tiny bug, maybe offtopic, because not related to semaphores is in BLEScan (also found today). In this code its very hard to see what is wrong: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/BLEScan.cpp#L104-L107

But think about situation when you do infinite scan and code never pass thru this point for some time because there is no new ble device to report? If you have other tasks running then you stall app, but its enough to add vTaskDelay(1) to switch between tasks.

My point is that we need to study code line by line, especially semaphores (thats why i opened issue about it, another one) because semaphores drives me crazy and sometimes i have no idea when, where and why they stall app.

Making multi connection feature i didnt found more issues than before with semaphores, except that sometimes (rare) app crash with similar backtrace:

0x40092e4c: invoke_abort at D:/msys32/home/imper/esp/esp-idf/components/esp32/panic.c:660

0x4009303d: abort at D:/msys32/home/imper/esp/esp-idf/components/esp32/panic.c:660

0x400d7997: __assert_func at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discrimina
tor 8)

0x400924c7: get_next_block at D:/msys32/home/imper/esp/esp-idf/components/heap/multi_heap.c:377
 (inlined by) assert_valid_block at D:/msys32/home/imper/esp/esp-idf/components/heap/multi_heap.c:164

0x400927ff: multi_heap_free_impl at D:/msys32/home/imper/esp/esp-idf/components/heap/multi_heap.c:377

0x40085012: heap_caps_free at D:/msys32/home/imper/esp/esp-idf/components/heap/heap_caps.c:269

0x400855c5: _free_r at D:/msys32/home/imper/esp/esp-idf/components/newlib/syscalls.c:42

0x400ea631: operator delete(void*) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/del_op.cc:46

0x4010d095: __gnu_cxx::new_allocator<char>::deallocate(char*, unsigned int) at d:\msys32\opt\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\5.2.0\ext/new_allocator.h:110
 (inlined by) std::allocator_traits<std::allocator<char> >::deallocate(std::allocator<char>&, char*, unsigned int) at d:\msys32\opt\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\5.2.0\bits/all
oc_traits.h:386
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_destroy(unsigned int) at d:\msys32\opt\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\5.2.0\bit
s/basic_string.h:185
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() at d:\msys32\opt\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\5.2.0\bits/basic_stri
ng.h:180
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at d:\msys32\opt\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\5.2.0\bits/basic_s
tring.h:544
 (inlined by) FreeRTOS::Semaphore::give() at D:/msys32/home/imper/ble_test/components/cpp_utils/FreeRTOS.cpp:143

0x401060a3: BLERemoteCharacteristic::gattClientEventHandler(esp_gattc_cb_event_t, unsigned char, esp_ble_gattc_cb_param_t*) at D:/msys32/home/imper/ble_test/components/cpp_utils/BLERemoteChara
cteristic.cpp:344

0x4010e289: BLERemoteService::gattClientEventHandler(esp_gattc_cb_event_t, unsigned char, esp_ble_gattc_cb_param_t*) at D:/msys32/home/imper/ble_test/components/cpp_utils/BLERemoteService.cpp:
267

0x40104a45: BLEClient::gattClientEventHandler(esp_gattc_cb_event_t, unsigned char, esp_ble_gattc_cb_param_t*) at d:\msys32\opt\xtensa-esp32-elf\xtensa-esp32-elf\include\c++\5.2.0\ext/new_alloc
ator.h:110

0x4010ff1d: BLEDevice::gattClientEventHandler(esp_gattc_cb_event_t, unsigned char, esp_ble_gattc_cb_param_t*) at D:/msys32/home/imper/ble_test/components/cpp_utils/BLEDevice.cpp:166

0x4012b6ab: btc_gattc_cb_to_app at D:/msys32/home/imper/esp/esp-idf/components/bt/bluedroid/btc/profile/std/gatt/btc_gattc.c:596
 (inlined by) btc_gattc_reg_for_notify at D:/msys32/home/imper/esp/esp-idf/components/bt/bluedroid/btc/profile/std/gatt/btc_gattc.c:678

0x4012bb03: btc_gattc_call_handler at D:/msys32/home/imper/esp/esp-idf/components/bt/bluedroid/btc/profile/std/gatt/btc_gattc.c:743

0x40128589: btc_task at D:/msys32/home/imper/esp/esp-idf/components/bt/bluedroid/btc/core/btc_task.c:188

0x40090735: vPortTaskWrapper at D:/msys32/home/imper/esp/esp-idf/components/freertos/port.c:435

But yes, i agree that we need to discuss how and where we could change code to make it work better. Until now i didnt feel like i am prepared to look at semaphores, but at the end i have to (time to learn something new), since Neil is busy and dont have time to work with library.

mws-rmain commented 6 years ago

I think the first issue is one I also encountered, and determined a fix for - that I have not yet posted. The issue must be checked for ALL places where semaphores are used. Back when I did, I found there were numerous places with various semaphores that the same issue occurs. Surprising, but I found some locations where the issue was properly addressed.

All ERROR PATHS that exit from functions that allocate a semaphore MUST invoke '.give()' on the semaphore before exiting. Otherwise, the semaphore remain blocked. This can cause all manner of problems, but there are numorous cases where an error exit path exits that does not release a held semaphore. For example, in BLECharacteristic::indicate():

    m_semaphoreConfEvt.take("indicate");

    esp_err_t errRc = ::esp_ble_gatts_send_indicate(
            getService()->getServer()->getGattsIf(),
            getService()->getServer()->getConnId(),
            getHandle(), length, (uint8_t*)m_value.getValue().data(), true); // The need_confirm = true makes this an indication.

    if (errRc != ESP_OK) {
        m_semaphoreConfEvt.give();  // RELEASE semaphore before exiting...
        ESP_LOGE(LOG_TAG, "<< esp_ble_gatts_send_indicate: rc=%d %s", errRc, GeneralUtils::errorToString(errRc));
        return;
    }
    m_semaphoreConfEvt.wait("indicate");

@chegewara the lines you pointed to are in notify(), and I don't believe the semaphore should even exist in notify().

In my case, requests to indicate() &/or notify() would sometimes fail - generally due to 'congested channel'. I believe a recent esp-idf update has addressed allot of these problems by updating their (closed source) BT Stack. The semaphore would still get cleared eventually, because of the indescriminate way ESP_GATTS_CONF_EVT cleared m_semaphoreConfEvt of ALL attributes when ANY confirmation event was triggered.

chegewara commented 6 years ago

My lines with notify code it was just example, but i agree with all you said.

I think ive done with multi connect code for some time, now it will require more tests so i could fix introduced bugs. I meantime we can coop and try to do something with semaphores. I am available here and on gitter.

chegewara commented 6 years ago

With semaphores is at least few issues. One of the biggest is this one: m_semaphoreConfEvt. It is taken in two places, when indication or notification is being send. But it is given when 3 different events occurs:

Why we are giving indication/notification semaphore in connect and disconnect event?

chegewara commented 6 years ago

@mws-rmain Could someone confirm my observations? Because we are working with bluetooth, and by default all bt tasks are pinned to core 0. This ble library is also build on tasks, but we by default are using tskNO_AFFINITY. I have strange feelings that we are having so many troubles with semaphores because of that. It can be tested in two ways:

chegewara commented 6 years ago

Hi all, i have few spare hours so i am playing now with semaphores. I think i can get rid off most of them. Only problem is that it will require extensive testing. ATM i have no more semaphores in BLEDescriptor and BLECharacteristic is only one left for indicate confirm i think (no more create or notify semaphores). It would be nice to point me where i can find more semaphores that can be removed.

mws-rmain commented 6 years ago

Thanks @chegewara. I haven't had much time to work on this recently. I had taken a different approach, grouping instances of semaphores into three classes, and ensuring appropriate handling for each class. Your approach (getting rid of them entirely) would be greatly preferred - Well done!

There are other semaphores that need a bit of cleanup (from my investigation). I'll post details when I can.

RE: the final remaining semaphore for indicate; I had raised an issue against esp-idf to include the associated attribute handle in the structure passed with ESP_GATTS_CONF_EVT. This should allow for much more efficient/intelligent handling of the event rather than the current 'release everything' shotgun. I'm unsure if this change is implemented in the latest build of esp-idf yet.

chegewara commented 6 years ago

This should allow for much more efficient/intelligent handling of the event rather than the current 'release everything' shotgun

I already did some cleanup in this matter. I had to do it because of multi connection feature, and now it looks very promising. I have some issues with BLEAdvertising and i have no idea what should i do, besides a lot changes in code that should make it run faster and without weird crashes because of uninitialized pointer or something like that. I will try to work in next few days at least few hours a day so should be better.

I received iPhone from our community member and i will try to find issues with HID library and next step will be to rewrite it to make it even easier to use.

mws-rmain commented 6 years ago

One note that might be of some interest WRT HID & Apple: I did some work on this some time back - attempting to get a product 100% compatible with the standard Apple keyboard over bluetooth with an iMac. The one feature that kept eluding us was operation of the 'special keys' ('fn' key & F1-F12) that implemented things like brightness control, media control, volume up-down, etc. I don't recall which, or if it was all of them, but some would just not work. We verified from multiple sources we were sending the correct codes, but to no avail. Finally found that Apple had something hard-coded into their stack that would ONLY accept these keycodes from a product that identified itself as an Apple product. We verified this by duplicating Apple's codes, and everything worked fine, then putting them back, and these keys were no longer accepted.

Of course, we couldn't release a product with Apple ids, so we had to settle for 98% compatibility.

mws-rmain commented 6 years ago

Verified the attribute handle was added to ESP_GATTS_CONF_EVT 21 days ago ( 44827bb ).

@chegewara have you commited your changes to your github 'esp-snippets-enhancements-test' repo, or are they still too raw?

chegewara commented 6 years ago

@mws-rmain Not yet, i have few tests to do and i can add now this handle to conf_evt too. For sure there still be a lot to do, but at least more eyes can find bugs and issues faster.

chegewara commented 6 years ago

Ok, i dont know if i didnt mess something with git push thing, but here is new PR: https://github.com/chegewara/esp32-snippets-enchancements-test/tree/multi_connect_test

chegewara commented 6 years ago

@mws-rmain i just pulled master commit and still have thise error: error: 'struct esp_ble_gatts_cb_param_t::gatts_conf_evt_param' has no member named 'handle', so your esp-idf issue is not merged yet to master. Its v3.2-beta

mws-rmain commented 6 years ago

I just pulled from your repo, and './tree/multi_connect_test' does not exist. ???

On Mon, Nov 5, 2018 at 4:08 PM chegewara notifications@github.com wrote:

Ok, i dont know if i didnt mess something with git push thing, but here is new PR:

https://github.com/chegewara/esp32-snippets-enchancements-test/tree/multi_connect_test

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nkolban/esp32-snippets/issues/681#issuecomment-436035332, or mute the thread https://github.com/notifications/unsubscribe-auth/AdHoNMscuVie-bW42tX7ZmYw0ZAgXqxzks5usKi5gaJpZM4XaBUv .

mws-rmain commented 6 years ago

very odd. I re-sync'd to master & physically verified in esp-idf\components\bt\bluedroid\api\include\api\esp_gatts_api.h, lines 117-127:

/**
 * @brief ESP_GATTS_CONF_EVT
 */
struct gatts_conf_evt_param {
    esp_gatt_status_t status;       /*!< Operation status */
    uint16_t conn_id;               /*!< Connection id */
    uint16_t handle;                /*!< attribute handle */
    uint16_t len;                   /*!< The indication or notification

value length, len is valid when send notification or indication failed / uint8_t value; /!< The indication or notification value , value is valid when send notification or indication failed / } conf; /!< Gatt server callback param of ESP_GATTS_CONF_EVT (confirm) /

maybe do a 'make clean' then rebuild?

On Mon, Nov 5, 2018 at 4:19 PM chegewara notifications@github.com wrote:

@mws-rmain https://github.com/mws-rmain i just pulled master commit and still have thise error: error: 'struct esp_ble_gatts_cb_param_t::gatts_conf_evt_param' has no member named 'handle', so your esp-idf issue is not merged yet to master.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nkolban/esp32-snippets/issues/681#issuecomment-436038808, or mute the thread https://github.com/notifications/unsubscribe-auth/AdHoNBNUpkeyiJtlm8ZjEuv1PCVicZR6ks5usKtZgaJpZM4XaBUv .

chegewara commented 6 years ago

Ok, my bad. I did git pull but it failed.

You have to clone https://github.com/chegewara/esp32-snippets-enchancements-test and then switch branch to multi_connect_test

chegewara commented 6 years ago

@mws-rmain We have handle in conf event, but it seems to be bugged. W (22205) handleGATTServerEvent: m_handle = 42, conf->handle = 0