Open bjandre opened 6 months ago
Is the device resetting at all when this happens or is it happy as far as you can tell?
You can try increasing the timeout arguments to the upload command. I'll look into why a task is getting cancelled, I thought that it would raise a timeout error instead.
The logs do not show any evidence of a MCU reset/reboot/fault during the upload. I enabled the zephyr mcumgr_smp debug logging and there is no indication of a problem on that end.
mcumgr-web is able to upload the large image, although it does occasionally hang and have to be restarted. I am also able to consistently upload a small dummy image with smpclient.
I'll look into the upload timeout arguments next.
What board / MCU are you using? I think that I can borrow an Arm Mac to see if I can reproduce.
Please consider forking and opening a PR if you find that the example script or anything else needs some changes.
@bjandre It appears to be an asyncio bug causing the nasty traceback, though I suspect the root cause is that the smpclient is not receiving the notify within the 2.5 second timeout.
The issue is still open at Python: https://bugs.python.org/issue39032
But the SO poster also included a workaround that I'll look into.
We're based on an nrf52840 dk, zephyr board nrf52840dk_nrf52840, with upstream zephyr 3.6, not the nordic sdk.
I experimented with the timeout parameter and it didn't help. It seems like a notify is being dropped, but I'm not sure if the issue is on the zephyr or smpclient side. I'm still investigating the root cause.
If we can get a cleaner exception when a notify timeout occurs, that would make it easier to trap at the application level and allow retries.
@bjandre If you don't mind wiping the flash on your DK, can I ask that you run the FW upgrade HW integration test for the nrf52840dk:
python -m examples.ble.upgrade nrf52840dk_nrf52840
And paste the output.
Expected:
Using DUT folder: C:\Users\jp\repos\smpclient\examples\duts\nrf52840dk_nrf52840\ble
Using merged.hex: C:\Users\jp\repos\smpclient\examples\duts\nrf52840dk_nrf52840\ble\a_smp_dut.merged.hex
Flashing the merged.hex...
[ #################### ] 1.150s | Erase file - Done erasing
[ #################### ] 1.689s | Program file - Done programming
[ #################### ] 1.733s | Verify file - Done verifying
Applying system reset.
Run.
A SMP DUT hash: SHA256=215144b99127acb3c66d9ec7540ee454703c3e15db7e12a713ad0f672d63321c
B SMP DUT hash: SHA256=62a8e0864e02761c8498d4380b8104370b028f7d63870833420442753f4558d0
Searching for A SMP DUT...OK
Connecting to A SMP DUT...2024-05-31 11:46:16.965 INFO smp_characteristic.max_write_without_response_size=495
2024-05-31 11:46:16.965 INFO self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0
Uploaded 225,130 / 225,130 Bytes | 26.78 KB/s
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=1, sequence=94, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
Confirmed the upload
Marking B SMP DUT for test...
Sending request...OK
Received response: header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=1, sequence=95, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=True, confirmed=False, active=False, permanent=False)] splitStatus=0
Resetting for swap...
Sending request...OK
Received response: header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=2, group_id=0, sequence=96, command_id=5) sequence=0
2024-05-31 11:46:26.196 WARNING Disconnected from C4:35:E3:1E:09:DB
Searching for B SMP DUT...OK
Connecting to B SMP DUT...2024-05-31 11:46:50.060 INFO smp_characteristic.max_write_without_response_size=495
2024-05-31 11:46:50.060 INFO self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=1, sequence=98, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=False, permanent=False)] splitStatus=0
Confirmed the swap
2024-05-31 11:46:53.318 WARNING Disconnected from C4:35:E3:1E:09:DB
I don't have a nrf52840 devkit with me, but our custom board is compatible.
When I run the upgrade script, it will detect the device by name "A SMP DUT", but it fails to connect by device address:
smpclient % python3 -m examples.ble.upgrade nrf52840dk_nrf52840
Using DUT folder: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble
Using merged.hex: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut.merged.hex
Flashing the merged.hex...
Recovering device. This operation might take 30s.
[ ###### ] 0.000s | Program file - Checking image
[ ### ] 0.000s | Check image validity - Initialize device inf
[ ###### ] 0.000s | Check image validity - Check region 0 settin
[ ########## ] 0.000s | Check image validity - block 1 of 4
[ ############# ] 0.004s | Check image validity - block 2 of 4
[ #################### ] 0.015s | Check image validity - Finished
[ ############# ] 0.000s | Program file - Programming
[ ##### ] 0.000s | Programming image - block 1 of 3
[ ########## ] 0.200s | Programming image - block 2 of 3
[ #################### ] 0.000s | Programming image - Write successful
[ #################### ] 1.584s | Program file - Done programming
[ ###### ] 0.000s | Verify file - Check image
[ ### ] 0.000s | Check image validity - Initialize device inf
[ ###### ] 0.000s | Check image validity - Check region 0 settin
[ ########## ] 1.713s | Check image validity - block 1 of 4
[ ############# ] 0.882s | Check image validity - block 2 of 4
[ #################### ] 0.000s | Check image validity - Finished
[ ############# ] 0.000s | Verify file - Verifying
[ ##### ] 0.000s | Verifying image - block 1 of 3
[ ########## ] 0.204s | Verifying image - block 2 of 3
[ #################### ] 0.000s | Verifying image - Verify successful
[ #################### ] 1.646s | Verify file - Done verifying
Applying system reset.
Run.
A SMP DUT hash: SHA256=215144b99127acb3c66d9ec7540ee454703c3e15db7e12a713ad0f672d63321c
B SMP DUT hash: SHA256=62a8e0864e02761c8498d4380b8104370b028f7d63870833420442753f4558d0
Searching for A SMP DUT...OK
Connecting to A SMP DUT...Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 146, in <module>
asyncio.run(main())
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 67, in main
async with SMPClient(SMPBLETransport(), a_smp_dut.address) as client:
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 153, in __aenter__
await self.connect()
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 37, in connect
await self._transport.connect(self._address, timeout_s)
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 84, in connect
raise SMPBLETransportDeviceNotFound(f"Device '{address}' not found")
smpclient.transport.ble.SMPBLETransportDeviceNotFound: Device '3D10EF7C-3B95-7EAD-315A-84D74250219B' not found
Increasing the connection timeout doesn't help, but modifying upgrade.py:67 to use the device name instead of address allows it to connect.
smpclient % python3 -m examples.ble.upgrade nrf52840dk_nrf52840
Using DUT folder: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble
Using merged.hex: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut.merged.hex
Flashing the merged.hex...
Recovering device. This operation might take 30s.
[ ###### ] 0.000s | Program file - Checking image
[ ### ] 0.000s | Check image validity - Initialize device inf
[ ###### ] 0.000s | Check image validity - Check region 0 settin
[ ########## ] 0.000s | Check image validity - block 1 of 4
[ ############# ] 0.003s | Check image validity - block 2 of 4
[ #################### ] 0.014s | Check image validity - Finished
[ ############# ] 0.000s | Program file - Programming
[ ##### ] 0.000s | Programming image - block 1 of 3
[ ########## ] 0.201s | Programming image - block 2 of 3
[ #################### ] 0.000s | Programming image - Write successful
[ #################### ] 1.577s | Program file - Done programming
[ ###### ] 0.000s | Verify file - Check image
[ ### ] 0.000s | Check image validity - Initialize device inf
[ ###### ] 0.000s | Check image validity - Check region 0 settin
[ ########## ] 1.710s | Check image validity - block 1 of 4
[ ############# ] 0.880s | Check image validity - block 2 of 4
[ #################### ] 0.000s | Check image validity - Finished
[ ############# ] 0.000s | Verify file - Verifying
[ ##### ] 0.000s | Verifying image - block 1 of 3
[ ########## ] 0.205s | Verifying image - block 2 of 3
[ #################### ] 0.000s | Verifying image - Verify successful
[ #################### ] 1.651s | Verify file - Done verifying
Applying system reset.
Run.
A SMP DUT hash: SHA256=215144b99127acb3c66d9ec7540ee454703c3e15db7e12a713ad0f672d63321c
B SMP DUT hash: SHA256=62a8e0864e02761c8498d4380b8104370b028f7d63870833420442753f4558d0
Searching for A SMP DUT...OK
Connecting to A SMP DUT...2024-05-31 15:02:59.483 INFO smp_characteristic.max_write_without_response_size=495
2024-05-31 15:02:59.484 INFO self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0
Uploaded 41,606 / 225,130 Bytes | 10.34 KB/s 2024-05-31 15:03:06.221 ERROR Exception in SMPClient: exc_type=<class 'RuntimeError'>, exc_value=RuntimeError('Lock is not acquired.'), traceback=<traceback object at 0x102305a00>
2024-05-31 15:03:06.226 WARNING Disconnected from 3D10EF7C-3B95-7EAD-315A-84D74250219B
Traceback (most recent call last):
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 146, in receive
await self._notify_or_disconnect()
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 215, in _notify_or_disconnect
done, pending = await asyncio.wait(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 464, in wait
return await _wait(fs, timeout, return_when, loop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 550, in _wait
await waiter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 147, in <module>
asyncio.run(main())
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 90, in main
async for offset in client.upload(b_smp_dut_bin):
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 135, in upload
response = await self.request(
^^^^^^^^^^^^^^^^^^^
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 51, in request
frame = await self._transport.send_and_receive(request.BYTES)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 180, in send_and_receive
return await self.receive()
^^^^^^^^^^^^^^^^^^^^
File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 144, in receive
async with self._notify_condition: # wait for the header
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 20, in __aexit__
self.release()
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 139, in release
raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
That makes sense, MacOS does not allow the use of BLE MAC addresses: https://github.com/hbldh/bleak/issues/140
This is good that it's so reproducible! I will borrow a Mac and see what I see. I'm assuming this is on Arm.
I'm suspecting that the FW has not actually received all 5 of the SMP packet fragments (2475 / 5 = 495), so it doesn't notify, and then the request times out. I wonder if it is a synchronization issue wherein the loop here:
Is not actually waiting for the previous transfers to complete. It would slow things way down, but I wonder if adding an asyncio.sleep(0.050)
after each fragment would make it more stable?
Yes, ARM based Mac.
I tried adding the asyncio.sleep and the results were inconclusive. In general it seemed to make it the same few K into the upload, but on a few runs it did make it noticeably further before hanging.
https://developer.apple.com/documentation/corebluetooth/cbperipheral/writevalue(_:for:type:)
https://stackoverflow.com/questions/65586100/rate-limiting-a-corebluetooth-write-without-response
I'm starting to suspect that Core Bluetooth writeValue returns without guaranteeing completion of transmission.
Here's where Bleak uses the Core Bluetooth API: https://github.com/hbldh/bleak/blob/d45ec90d3cca4113da7c118cac753f441c46cca2/bleak/backends/corebluetooth/PeripheralDelegate.py#L179-L202
Generally, I've noticed BLE being better on MacOS/iOS. I wonder if it's so good that it's "too good for the peripheral" - it's sending the fragmented packets too quickly.
The manual fragmentation that's done is required on Windows - at least with the BLE adapter that I'm using. I wonder if removing the fragmentation and telling the CB API to send the entire 2475B packet would fix it?
May be possible to make Bleak properly async with this: https://developer.apple.com/documentation/corebluetooth/cbperipheraldelegate/peripheralisready(tosendwritewithoutresponse:)/
Or use write with response and/or send all 2475 bytes.
Currently testing from a MacBook Pro 14 Nov 2023 M3 Pro Sonoma 14.5
Using this branch: https://github.com/intercreate/smpclient/tree/fix-%2331/image-upload-match
Strange thing is that I was initially able to reproduce the error. I started fiddling with the timeouts and making some changes to .venv/lib/python3.9/site-packages/bleak/backends/corebluetooth
and now I can't reproduce the timeout error!
The test setup is to have an nrf52840dk with the examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut_image_check.merged.hex
image on it. Then test uploads like
python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut.bin
You have to keep switching between a/b because an upload of an image that is already uploaded will simply complete immediately:
python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
Scanning for SMP servers...OK
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:37:08.344 ble:108 INFO smp_characteristic.max_write_without_response_size=495
2024-06-02 14:37:08.344 ble:132 INFO self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0
Uploaded 225,130 / 225,130 Bytes | 12.96 KB/s 2024-06-02 14:37:25.895 __init__:150 INFO Upload complete
2024-06-02 14:37:25.895 __init__:153 INFO Server reports response.match=True
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=244, group_id=1, sequence=94, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
2024-06-02 14:37:25.958 ble:228 WARNING Disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6
python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
Scanning for SMP servers...OK
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:40:49.059 ble:108 INFO smp_characteristic.max_write_without_response_size=495
2024-06-02 14:40:49.059 ble:132 INFO self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=244, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
Uploaded 225,130 / 225,130 Bytes | 471.33 KB/s 2024-06-02 14:40:49.750 __init__:150 INFO Upload complete
2024-06-02 14:40:49.751 __init__:153 INFO Server reports response.match=True
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=244, group_id=1, sequence=3, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
2024-06-02 14:40:49.844 ble:228 WARNING Disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6
Uploads are slower than I expect. Typically about 15KB/s where Windows gets 25KB/s. MTU is negotiating to 495.
Write with response does not work, I guess it's not setup on the peripheral:
bleak.exc.BleakError: Failed to write characteristic 17: Error Domain=CBATTErrorDomain Code=3 "Writing is not permitted." UserInfo={NSLocalizedDescription=Writing is not permitted.}
Sending the entire packet also does not work (hangs and time out).
By using the upload example script, I have discovered something strange. After a successful upload, the next upload will fail with IMG_MGMT_ERR.FLASH_ERASE_FAILED
. If I retry it will succeed. This implies that the failed request is changing the device state in some way that causes the subsequent upload to succeed. Probably not intended behavior. Also likely that repeated uploads doesn't get a lot of coverage!
2024-06-02 14:35:11.822 __init__:174 ERROR Exception in SMPClient: exc_type=<class 'smpclient.exceptions.SMPUploadError'>, exc_value=SMPUploadError(ImageManagementErrorV1(header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=19, group_id=1, sequence=2, command_id=1), sequence=0, err=Err[IMG_MGMT_ERR](group=<GroupId.IMAGE_MANAGEMENT: 1>, rc=<IMG_MGMT_ERR.FLASH_ERASE_FAILED: 13>))), traceback=<traceback object at 0x10262d7c0>
In Bleak, I've added this to .venv/lib/python3.9/site-packages/bleak/backends/corebluetooth/PeripheralDelegate.py
def peripheralIsReady_toSendWriteWithoutResponse_(self, peripheral: CBPeripheral, characteristic: CBCharacteristic) -> None:
logger.debug(type(peripheral), type(characteristic))
logger.error("Called!")
It registers but it does not get called.
In .venv/lib/python3.9/site-packages/bleak/backends/corebluetooth/client.py
I've modified the write_gatt_char()
function with a hack to force it to async sleep until it's ready to send another message. The actual fix should be to use the callback mentioned above to set a Future, but I think that it's all a non-issue. If that callback is not called then there is no error!
async def write_gatt_char(
self,
characteristic: BleakGATTCharacteristic,
data: Buffer,
response: bool,
) -> None:
value = NSData.alloc().initWithBytes_length_(data, len(data))
await self._delegate.write_characteristic(
characteristic.obj,
value,
(
CBCharacteristicWriteWithResponse
if response
else CBCharacteristicWriteWithoutResponse
),
)
while self._peripheral.canSendWriteWithoutResponse() != True:
logger.error("Not ready to send!")
await asyncio.sleep(0.010)
logger.debug(f"Write Characteristic {characteristic.uuid} : {data}")
This results in
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:47:27.360 ble:108 INFO smp_characteristic.max_write_without_response_size=495
2024-06-02 14:47:27.361 ble:132 INFO self._max_write_without_response_size=495
2024-06-02 14:47:27.420 client:336 ERROR Not ready to send!
OK
Sending request...2024-06-02 14:47:27.481 client:336 ERROR Not ready to send!
OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0
2024-06-02 14:47:27.542 client:336 ERROR Not ready to send!
2024-06-02 14:47:27.553 client:336 ERROR Not ready to send!
2024-06-02 14:47:27.565 client:336 ERROR Not ready to send!
2024-06-02 14:47:27.576 client:336 ERROR Not ready to send!
2024-06-02 14:47:27.588 client:336 ERROR Not ready to send!
Uploaded 2,390 / 225,131 Bytes | 4.19 KB/s
2024-06-02 14:47:28.112 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.124 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.136 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.147 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.158 client:336 ERROR Not ready to send!
Uploaded 4,841 / 225,131 Bytes | 6.43 KB/s
2024-06-02 14:47:28.294 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.306 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.317 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.329 client:336 ERROR Not ready to send!
2024-06-02 14:47:28.340 client:336 ERROR Not ready to send!
etc.
So, the canSendWriteWithoutResponse flag does seem to be False for ~50ms after avery message. But even without this delay, calling writeWithoutResponse is not raising any error callbacks. https://developer.apple.com/documentation/corebluetooth/cbperipheral/cansendwritewithoutresponse
Gonna try hard coding the MTU to 244 which seems to be what your Mac negotiated. Still worked OK.
I'm trying moving the DK further away from the Mac, past several walls. This makes the transfer fail, but in a sorta reasonable way:
python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
Scanning for SMP servers...OK
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:56:33.552 ble:108 INFO smp_characteristic.max_write_without_response_size=495
2024-06-02 14:56:33.552 ble:132 INFO self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0
Uploaded 21,998 / 225,130 Bytes | 4.24 KB/s 2024-06-02 14:56:39.814 ble:228 WARNING Disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6
2024-06-02 14:56:39.815 __init__:174 ERROR Exception in SMPClient: exc_type=<class 'smpclient.transport.SMPTransportDisconnected'>, exc_value=SMPTransportDisconnected('SMPBLETransport disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6'), traceback=<traceback object at 0x102d44b40>
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/JP/repos/smpclient/examples/ble/upload.py", line 73, in <module>
asyncio.run(main())
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/Users/JP/repos/smpclient/examples/ble/upload.py", line 51, in main
async for offset in client.upload(fw_file, 2):
File "/Users/JP/repos/smpclient/smpclient/__init__.py", line 137, in upload
response = await self.request(
File "/Users/JP/repos/smpclient/smpclient/__init__.py", line 53, in request
frame = await self._transport.send_and_receive(request.BYTES)
File "/Users/JP/repos/smpclient/smpclient/transport/ble.py", line 195, in send_and_receive
return await self.receive()
File "/Users/JP/repos/smpclient/smpclient/transport/ble.py", line 161, in receive
await self._notify_or_disconnect()
File "/Users/JP/repos/smpclient/smpclient/transport/ble.py", line 244, in _notify_or_disconnect
raise SMPTransportDisconnected(
smpclient.transport.SMPTransportDisconnected: SMPBLETransport disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6
Discussing here: https://github.com/hbldh/bleak/discussions/1589
@bjandre Can you try increasing the delay between the packets? And LMK the Mac model and OS version. Thanks!
I'm using a MacBook Pro, with Apple M1 Pro chip. MacOS Sonoma.
I'm glad you were able to reproduce the issue, at least initially.
I tried changing the asyncio.sleep command we discussed previously with a few values up to 1 sec, but it didn't change anything. If there is a different delay value please let me know where to set it and I will try it.
Can you try changing this line to return self.mtu?
https://github.com/intercreate/smpclient/blob/main/smpclient/transport/__init__.py#L54C32-L54C42
This should prevent SMP packet fragmentation. If it helps, then it broadly confirms the source of the issue.
JP
Update: I can confirm that mcumgr-web does not fragment the packets and instead uploads packets sized to the MTU.
https://github.com/boogie/mcumgr-web/blob/main/js/mcumgr.js#L224
I'm trying to build a small smpclient based application using BLE transport. I am running into an asyncio error during DFU image upload that I'm not sure how to debug. I have been able to reproduce it using the code in examples/ble/upload.py. The upload is large and smpclient is able to get through a few 2475 byte buffers with mtu=244 bytes before it crashes.
Any debugging help would be appreciated.
smpclient version: 3.2.0
examples/ble/upload.py output:
Uploaded 44,057 / 398,704 Bytes | 23.64 KB/s Exception in SMPClient: exc_type=<class 'RuntimeError'>, exc_value=RuntimeError('Lock is not acquired.'), traceback=<traceback object at 0x104f2a140> Disconnected from 3D10EF7C-3B95-7EAD-315A-84D74250219B Traceback (most recent call last): File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 146, in receive
await self._notify_or_disconnect()
File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 215, in _notify_or_disconnect
done, pending = await asyncio.wait(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 464, in wait
return await _wait(fs, timeout, return_when, loop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 550, in _wait
await waiter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users//projects/research/smpclient-examples/smpclient/examples/ble/upload.py", line 64, in
asyncio.run(main())
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users//projects/research/smpclient-examples/smpclient/examples/ble/upload.py", line 42, in main
async for offset in client.upload(fw_file, 2):
File "/Users//projects/research/smpclient-examples/smpclient/smpclient/init.py", line 135, in upload
response = await self.request(
^^^^^^^^^^^^^^^^^^^
File "/Users//projects/research/smpclient-examples/smpclient/smpclient/init.py", line 51, in request
frame = await self._transport.send_and_receive(request.BYTES)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 180, in send_and_receive
return await self.receive()
^^^^^^^^^^^^^^^^^^^^
File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 144, in receive
async with self._notify_condition: # wait for the header
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 20, in aexit
self.release()
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 139, in release
raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.