Open Chris-StJohn-Bose opened 1 month ago
Thanks @Chris-StJohn-Bose - This is indeed a problematic pattern that is used many places in LE Audio. It is unlikely to cause actual problems, but it is something we actually hit often when implementing unit tests.
@Thalley you're welcome. I have been toying with the idea of implementing (in my spare time...) a synchronous GATT API for Zephyr so that we could eventually push all the multi-threaded stuff "under the hood".
I don't suppose you're aware of any other efforts in this direction? For Audio LE in particular the "application" thread is mainly idling whilst waiting for its callback; several example applications I've seen even implement a synchronous layer to make the example more readable e.g. https://github.com/zephyrproject-rtos/zephyr/blob/main/samples/bluetooth/bap_unicast_client/src/main.c
@Thalley you're welcome. I have been toying with the idea of implementing (in my spare time...) a synchronous GATT API for Zephyr so that we could eventually push all the multi-threaded stuff "under the hood".
I don't suppose you're aware of any other efforts in this direction? For Audio LE in particular the "application" thread is mainly idling whilst waiting for its callback; several example applications I've seen even implement a synchronous layer to make the example more readable e.g. https://github.com/zephyrproject-rtos/zephyr/blob/main/samples/bluetooth/bap_unicast_client/src/main.c
I am not aware of any plans to make a synchronous GATT API (and doubt that Zephyr will ever go in that direction). There are plans to reduce the number of threads that the stack uses for Bluetooth (we recently removed the TX thread).
Ideally in the future we will be using a event based API so that an application could do something like
bt_gatt_discover(...);
get_event(BT_GATT_DISCOVER_COMPLETED);
or something like that (effectively an Async/await pattern (https://en.wikipedia.org/wiki/Async/await)
The direction of my thinking is that currently the discovery callback function, for instance, may be called any time after bt_gatt_discover()
is invoked - even before it returns (hence this bug...)
What if there were an option to set a GATT connection to be synchronous so that it is guaranteed that all callbacks would be called before return and the params
pointer never retained? It would require minimal code changes to upper layer stacks such as BAP etc (and might expose a few bugs like this one) but an application setting this option would have much better control over callbacks, memory lifetime and error handling.
Anyway, just a thought. Pretty similar to your get_event()
but implicit with an option rather than explicit after each API call.
That is a fair point. It would be similar to bt_hci_cmd_send_sync
(https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/bluetooth/hci.h#L62-L105).
From a simplicity perspective (in the application) it could make sense. There is, however, nothing stopping applications from doing this themselves, and is in fact how many of our babblesim tests are designed.
Whether it belongs in the stack is debatable. It adds complexity (if both sync and async can be used at the same time), it worsens performance as you wouldn't do multiple outstanding requests on multiple connections, adds memory and computational overhead, etc.
To me it seems like a patch for this callback-oriented architecture Bluetooth in Zephyr has, rather than a full proper solution (which I would argue that an event-based implementation would be).
Describe the bug This snippet of code assumes that
unicast_client_pac_discover_cb()
isn't called beforebt_bap_unicast_client_discover()
returns.https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/audio/bap_unicast_client.c#L4194C1-L4210C22
but since
unicast_client_pac_discover_cb()
is called from a different thread a pre-emptive context switch before settingclient->dir
orclient->busy
will cause the discovery state to get very confused.Potentially, for instance, the discovery might complete before
client->busy = true
is reached in which case there's no pending callback to set it false again.I suggest instead that there is some "unwinding" of the state on error:
I raise this as a bug rather than a PR since there may be other places where this anti-pattern has been used?
To Reproduce Insert an intentional context switch before
client->dir = dir
such ask_sleep(K_SECONDS(5))
Expected behavior The implementation shouldn't expect to execute logic out-of-order on the assumption that "it's fast enough".
Impact Minor. Realistically this probably won't happen - until it does.
Logs and console output N/A - found whilst reviewing code
Environment (please complete the following information): N/A - found whilst reviewing code
Additional context None