hbldh / bleak

A cross platform Bluetooth Low Energy Client for Python using asyncio
MIT License
1.79k stars 296 forks source link

Crash on mac after timeout connecting to device #538

Closed jackjansen closed 3 years ago

jackjansen commented 3 years ago

Description

Sometimes (not always) when I get a timeout connecting to a device I get an asyncio.exceptions.TimeoutError exception. Subsequently the program crashes.

Even if I catch the exception the Python program will still crash on exit.

Here are the relevant bits of the crash report:

Crashed Thread:        5  Dispatch queue: bleak.corebluetooth

Exception Type:        EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes:       0x0000000000000001, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Illegal instruction: 4
Termination Reason:    Namespace SIGNAL, Code 0x4
Terminating Process:   exc handler [20790]

Application Specific Information:
BUG IN CLIENT OF LIBPTHREAD: pthread_exit() called from a thread not created by pthread_create()

Thread 0:: Dispatch queue: com.apple.main-thread
0   org.python.python               0x00000001031f7a64 PyTuple_ClearFreeList + 84
1   org.python.python               0x00000001032dc4ce collect + 3454
2   org.python.python               0x00000001032db737 _PyGC_CollectNoFail + 55
3   org.python.python               0x00000001032a2db1 PyImport_Cleanup + 1521
4   org.python.python               0x00000001032b8f98 Py_FinalizeEx + 152
5   org.python.python               0x00000001032da8db Py_RunMain + 1659
6   org.python.python               0x00000001032dae3f pymain_main + 223
7   org.python.python               0x00000001032db03b Py_BytesMain + 43
8   libdyld.dylib                   0x00007fff20619f3d start + 1

Thread 5 Crashed:: Dispatch queue: bleak.corebluetooth
0   libsystem_pthread.dylib         0x00007fff205fc7cd pthread_exit + 56
1   org.python.python               0x00000001032ca905 PyThread_exit_thread + 21
2   org.python.python               0x000000010326dd90 PyEval_RestoreThread + 128
3   org.python.python               0x00000001032bc68a PyGILState_Ensure + 122
4   _objc.cpython-38-darwin.so      0x0000000103a5ff98 method_stub + 40
5   _objc.cpython-38-darwin.so      0x0000000103a83294 ffi_closure_unix64_inner + 708
6   _objc.cpython-38-darwin.so      0x0000000103a827b6 ffi_closure_unix64 + 70
7   com.apple.CoreBluetooth         0x00007fff30b31822 -[CBCentralManager handlePeripheralDisconnectionCompleted:] + 223
8   com.apple.CoreBluetooth         0x00007fff30b31822 -[CBCentralManager handlePeripheralDisconnectionCompleted:] + 223
9   com.apple.CoreBluetooth         0x00007fff30b31822 -[CBCentralManager handlePeripheralDisconnectionCompleted:] + 223

that last one is repeated for the 512 stack frames dumped in the crash report.

What may be relevant is that I'm connecting to devices that are powered for only 200ms after sending an advertisement and then power down for 2 seconds (unless contacted by a client within those 200ms). If I understand the BLE specs correctly this is supported behaviour.

dlech commented 3 years ago

I wonder if we have an unhandled exception in the device disconnected callback (centralManager_didDisconnectPeripheral_error_).

dlech commented 3 years ago

We could try something like this:

    def centralManager_didDisconnectPeripheral_error_(
        self, central: CBCentralManager, peripheral: CBPeripheral, error: NSError
    ):
        try:
            logger.debug("centralManager_didDisconnectPeripheral_error_")
            self.event_loop.call_soon_threadsafe(
                self.did_disconnect_peripheral,
                central,
                peripheral,
                error,
            )
        except:
            logger.exception("")
jackjansen commented 3 years ago

I can give this a try, but it will take a week or more before I'm able to do so (I won't have access to the devices the coming week at least).

Dallin-Lloyd commented 3 years ago

I tried this to see if it would change the state of my issue i have been having with disconnecting callbacks. When i read this description i thought that they might be correlated. No change on my end.

dlech commented 3 years ago

I missed the part before that this crash happens at exit. So my suggestion probably doesn't help.

dlech commented 3 years ago

I was able to reproduce the issue by setting timeout=0.15 in BleakClient().

The problem is that when the Python runtime has exited we can still get delegate callbacks for the central manager or a peripheral. These are running on a thread created by libdispatch.dispatch_queue_create. The man page has this to say:

The dispatch framework manages the relationship between dispatch queues and threads of execution. As a result, applications MUST NOT delete or mutate objects that they did not create. The following interfaces MUST NOT be called by blocks submitted to a dispatch queue:

  • pthread_cancel()
  • pthread_detach()
  • pthread_join()
  • pthread_kill()
  • pthread_exit()

However the callback from pyobjc is calling PyGILState_Ensure() which eventually calls take_gil() which is the source of the PyThread_exit_thread() calls that call the forbidden pthread_exit(). The code references a couple of bugs explaining why it does this.

We might be able to work around this in Bleak by disconnecting all delegates before the runtime exits. But it would be worthwhile reporting the bug to PyObjC and or CPython.

The docs also mention a dispatch_suspend function that could be called, perhaps with atexit to avoid the callback that causes the crash.

dlech commented 3 years ago

I tried this:

diff --git a/bleak/backends/corebluetooth/CentralManagerDelegate.py b/bleak/backends/corebluetooth/CentralManagerDelegate.py
index 4004a27..0640b5b 100644
--- a/bleak/backends/corebluetooth/CentralManagerDelegate.py
+++ b/bleak/backends/corebluetooth/CentralManagerDelegate.py
@@ -6,6 +6,7 @@ Created on June, 25 2019 by kevincar <kevincarrolldavis@gmail.com>

 """

+import atexit
 import asyncio
 import itertools
 import logging
@@ -33,7 +34,7 @@ from Foundation import (
     NSNumber,
     NSError,
 )
-from libdispatch import dispatch_queue_create, DISPATCH_QUEUE_SERIAL
+from libdispatch import dispatch_queue_create, dispatch_suspend, DISPATCH_QUEUE_SERIAL

 from bleak.backends.corebluetooth.PeripheralDelegate import PeripheralDelegate
 from bleak.backends.corebluetooth.device import BLEDeviceCoreBluetooth
@@ -80,9 +81,14 @@ class CentralManagerDelegate(NSObject):
         self._connection_state_changed = asyncio.Event()

         self._did_update_state_event = threading.Event()
+
+        dispatch_queue = dispatch_queue_create(
+            b"bleak.corebluetooth", DISPATCH_QUEUE_SERIAL
+        )
         self.central_manager = CBCentralManager.alloc().initWithDelegate_queue_(
-            self, dispatch_queue_create(b"bleak.corebluetooth", DISPATCH_QUEUE_SERIAL)
+            self, dispatch_queue
         )
+        atexit.register(dispatch_suspend, dispatch_queue)

         # according to CoreBluetooth docs, it is not valid to call CBCentral
         # methods until the centralManagerDidUpdateState_() delegate method

But it caused this:

Traceback (most recent call last):
  File "test.py", line 53, in <module>
    asyncio.run(test())
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 612, in run_until_complete
    return future.result()
  File "test.py", line 35, in test
2021-06-02 21:50:31.394 Python[43622:5876748] PyObjC: Converting exception to Objective-C:
Traceback (most recent call last):
  File "/Users/david/Documents/GitHub/pybricks/bleak/bleak/backends/corebluetooth/CentralManagerDelegate.py", line 288, in centralManager_didConnectPeripheral_
    self.event_loop.call_soon_threadsafe(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 760, in call_soon_threadsafe
    self._check_closed()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
    async with BleakClient(
  File "/Users/david/Documents/GitHub/pybricks/bleak/bleak/backends/client.py", line 61, in __aenter__
    await self.connect()
  File "/Users/david/Documents/GitHub/pybricks/bleak/bleak/backends/corebluetooth/client.py", line 90, in connect
    await manager.connect_(self._device_info, timeout=timeout)
  File "/Users/david/Documents/GitHub/pybricks/bleak/bleak/backends/corebluetooth/CentralManagerDelegate.py", line 162, in connect_
    await asyncio.wait_for(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
2021-06-02 21:50:31.420 Python[43622:5876748] *** Terminating app due to uncaught exception 'OC_PythonException', reason: '<class 'RuntimeError'>: Event loop is closed'
*** First throw call stack:
(
        0   CoreFoundation                      0x00007fff206bb87b __exceptionPreprocess + 242
        1   libobjc.A.dylib                     0x00007fff203f3d92 objc_exception_throw + 48
        2   CoreFoundation                      0x00007fff206e3f76 -[NSException raise] + 9
        3   _objc.cpython-38-darwin.so          0x000000010f76066e PyObjCErr_ToObjCWithGILState + 46
        4   _objc.cpython-38-darwin.so          0x000000010f759b0a method_stub + 4986
        5   libffi.dylib                        0x00007fff2db024d4 ffi_closure_unix64_inner + 502
        6   libffi.dylib                        0x00007fff2db02a8c ffi_closure_unix64 + 72
        7   CoreBluetooth                       0x00007fff30a9c078 -[CBCentralManager handleMsg:args:] + 811
        8   CoreBluetooth                       0x00007fff30a8e067 -[CBManager xpcConnectionDidReceiveMsg:args:] + 177
        9   CoreBluetooth                       0x00007fff30a8dfa3 __30-[CBXpcConnection _handleMsg:]_block_invoke + 66
        10  libdispatch.dylib                   0x00007fff2039d623 _dispatch_call_block_and_release + 12
        11  libdispatch.dylib                   0x00007fff2039e806 _dispatch_client_callout + 8
        12  libdispatch.dylib                   0x00007fff203a45ea _dispatch_lane_serial_drain + 606
        13  libdispatch.dylib                   0x00007fff203a50e0 _dispatch_lane_invoke + 417
        14  libdispatch.dylib                   0x00007fff203a4493 _dispatch_lane_serial_drain + 263
        15  libdispatch.dylib                   0x00007fff203a50ad _dispatch_lane_invoke + 366
        16  libdispatch.dylib                   0x00007fff203aec0d _dispatch_workloop_worker_thread + 811
        17  libsystem_pthread.dylib             0x00007fff2054545d _pthread_wqthread + 314
        18  libsystem_pthread.dylib             0x00007fff2054442f start_wqthread + 15
)
libc++abi: terminating with uncaught exception of type NSException
zsh: abort      python test.py
jackjansen commented 3 years ago

I was able to reproduce the issue by setting timeout=0.15 in BleakClient().

Ah, wonderful! I have tried to re-create the issue but failed.

The problem is that when the Python runtime has exited we can still get delegate callbacks for the central manager or a peripheral. These are running on a thread created by libdispatch.dispatch_queue_create.

Hmm. That sounds like bleak doesn't shut down CoreBluetooth things correctly when exiting. At least: I would be surprised if CoreBluetooth didn't clean up any libdispatch stuff it had outstanding when you shut it down...

Or actually: simply unregistering the delegate should be good enough. I've tried to follow the code, but I can't find where the CentralManagerDelegate is passed to CoreBluetooth. (the code isn't structured like standard ObjC/CoreXXX code, obviously, but it isn't structured along standard Python lines either... And the naming convention seems to mix the concepts of manager and manager-delegate. Ah, found it: the manager is an instance variable of the CentralManagerDelegate that is its delegate. Interesting...

But it should be simple to clean things up: add a method CentralManagerDelegate.shutdown() that sets central_manager.delegate to None and then deletes central_manager.

jackjansen commented 3 years ago

I was able to reproduce the issue by setting timeout=0.15 in BleakClient().

Can you provide sample code to show how you repeated the issue? Because I was just trying this (with the intention to see if I could provide a patch along the lines what I suggested in the previous post) but for me adding the timeout doesn't make the crash happen. I get the bleak exception (device was not found) but I don't get the crash on exit anymore...

dlech commented 3 years ago

The crash doesn't happen every time. You might be able to try different values for the timeout to find one that works for the combination of your computer and your BLE device. The program I used is similar to https://github.com/hbldh/bleak/blob/develop/examples/uart_service.py where BleakScanner gets a BLEDevice object that is passed to BleakClient rather than passing a UUID to BleakClient.

dlech commented 3 years ago

the code isn't structured like standard ObjC/CoreXXX code, obviously, but it isn't structured along standard Python lines either... And the naming convention seems to mix the concepts of manager and manager-delegate

I agree the code here is a bit confusing. If you have any suggestions on how to organize it better, let us know. How would standard ObjC/CoreXYZ code be structured?

jackjansen commented 3 years ago

Sorry, my original remark sounded like criticism, but it definitely wasn't intended as such! Bleak tries to unify 3 completely different APIs into a single Pythonic one, so obviously you shouldn't follow the ObjC/Apple/NextStep way of structuring things.

But, in case it helps, here's my understanding of the ObjC way to structure things. The global structure of your app data model (classes that govern the global structure of the app) is all as provided by Apple, without subclassing by your app. Then you attach your own functionality through the delegate objects. But your delegate objects have no "structure", they're bits of functionality hanging off Apple's classes. So for navigating around the structure the delegate would go to the object it is a delegate of, then navigate the structure there, then find the delegate it wants to say something to.

I think that keeping the bleak structure as-is, but ensuring that the object/delegate difference is clear in the code (at least by keeping a reference to the CoreBluetooth object and to the delegate (if the delegate isn't the bleak object itself) would help. Possibly with ample assert self._whateverthecorebluetoothobjectiscalled.delegate is self even if only for documentation purposes.

The other issue with readability is that ObjC is already a mess with parallelism with callbacks/completions used in some some places and with libdispatch in others, and Python coroutines are now a third paradigm, and each of these three is a nice paradigm by itself but they bite one another.

I have no easy solution for this...

dlech commented 3 years ago

Sorry, my original remark sounded like criticism

No, not at all. 😄 I was just curious to hear a new/different perspective.