intercreate / smpclient

Simple Management Protocol (SMP) Client for remotely managing MCU firmware
Apache License 2.0
9 stars 6 forks source link

asyncio error during large image upload #30

Open bjandre opened 6 months ago

bjandre commented 6 months ago

I'm trying to build a small smpclient based application using BLE transport. I am running into an asyncio error during DFU image upload that I'm not sure how to debug. I have been able to reproduce it using the code in examples/ble/upload.py. The upload is large and smpclient is able to get through a few 2475 byte buffers with mtu=244 bytes before it crashes.

Any debugging help would be appreciated.

Uploaded 44,057 / 398,704 Bytes | 23.64 KB/s Exception in SMPClient: exc_type=<class 'RuntimeError'>, exc_value=RuntimeError('Lock is not acquired.'), traceback=<traceback object at 0x104f2a140> Disconnected from 3D10EF7C-3B95-7EAD-315A-84D74250219B Traceback (most recent call last): File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 146, in receive await self._notify_or_disconnect() File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 215, in _notify_or_disconnect done, pending = await asyncio.wait( ^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 464, in wait return await _wait(fs, timeout, return_when, loop) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 550, in _wait await waiter asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/Users//projects/research/smpclient-examples/smpclient/examples/ble/upload.py", line 64, in asyncio.run(main()) File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users//projects/research/smpclient-examples/smpclient/examples/ble/upload.py", line 42, in main async for offset in client.upload(fw_file, 2): File "/Users//projects/research/smpclient-examples/smpclient/smpclient/init.py", line 135, in upload response = await self.request( ^^^^^^^^^^^^^^^^^^^ File "/Users//projects/research/smpclient-examples/smpclient/smpclient/init.py", line 51, in request frame = await self._transport.send_and_receive(request.BYTES) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 180, in send_and_receive return await self.receive() ^^^^^^^^^^^^^^^^^^^^ File "/Users//projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 144, in receive async with self._notify_condition: # wait for the header File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 20, in aexit self.release() File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 139, in release raise RuntimeError('Lock is not acquired.') RuntimeError: Lock is not acquired.


* Tail end of the logging output:
```log
DEBUG:smpclient.transport.ble:Sent 2475 bytes
DEBUG:smpclient.transport.ble:Waiting for notify on SMP_CHARACTERISTIC_UUID=UUID('da2e7828-fbce-4e01-ae9e-261174997c48')
DEBUG:bleak.backends.corebluetooth.PeripheralDelegate:peripheral_didUpdateValueForCharacteristic_error_
DEBUG:smpclient.transport.ble:Received 17 bytes from SMP_CHARACTERISTIC_UUID=UUID('da2e7828-fbce-4e01-ae9e-261174997c48')
DEBUG:smpclient.transport.ble:Received header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=9, group_id=1, sequence=8, command_id=1)
DEBUG:smpclient.transport.ble:Waiting for the rest of the 17 byte response
DEBUG:smpclient.transport.ble:Finished receiving 17 byte response
DEBUG:smpclient.transport.ble:Sending 2475 bytes, self.mtu=244
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\x02\x00\t\xa3\x00\x01\t\x01\xa2coff\x19\x85\xcdddataY\t\x93\xd1H\xb1oJ F6\xf02\xfbo\xf0\x07\x05(F%\xb0\xbd\xe8\xf0\x8f-\xb3[\xf8\x04\x0c\x01\xf0\x15\xf8\x98\xb1g\xb1\xa5h\x0b\x97\x085\n\xae\x0f\xce\x0f\xc53h+` F\xff\xf7^\xff\x01%\xe7\xe7aJ\x01! F6\xf0\x13\xfb\xdf\xe7\x02\x9b\xc3\xb1\x08\xf1\xff8\xfe\xb9\x04\x9b\xb8\xf1\x00\x0f\x01\xd1\x02+\xb3\xd1\x015\xb1\xe7\x02\x9b[\xb1\x08\xf1\xff8\x96\xb9\xa3h\xdbi\x00+;\xd0\xdbh\x00+8\xd0\x00&3\xe0[\xf8\x04\x1c\x05\xaaPF6\xf0\x99\xf9\x08\xf1\xff8\x06F8\xb3\xf3h\xab\xb1\xa3h\xb6F\x03\xf1\x08\x0c\xbe\xe8\x0f\x00\xac\xe8\x0f\x00\xde\xf8\x000\xcc\xf8\x000\xb3h3\xb9s|\xfe+\x04\xbf\x96\xf8\x10\x80\x08\xf1\xff8\xa9F\xd6\xf8\x04\xc0\xbc\xf1\x00\x0f\x08\xd0\xb6F\xbe\xe8\x0f\x00\n\xaf\x0f\xc7\xde\xf8\x00'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'0;`gF\xb2F\xb7\xe7\x00-\xc2\xd0\x01#\x02\x93\xb2\xe76K\x00\x936J\x0f\x9b\x01! F6\xf0\xb8\xfa\xbd\xe7\xb8\xf1\x00\x0f\x7f\xf4n\xaf\xba\xf1\x00\x0f\n\xd0\x9a\xf8\x110\xfe+\x06\xd1$\xab\x03\xeb\x85\x03S\xf8T<\x03\xb1\x015\xa0hCiS\xb9\x00/?\xf4m\xaf\xc3h\xbbB?\xf4~\xaf\x0b\x97\x00\xf1\x08\x05t\xe7\x03~\xa5\xeb\t\x05\xb3\xb1A~O\xf6\xffr\xff)\x18\xbf\nF\x9dB\x02\xd3\x13D\x9dB\x0b\xd9\x83h\x1cJ\x01! F6\xf0\x82\xfa F\xff\xf7\xc3\xfeo\xf0\x15\x05K\xe7\x01!5\xf0\xae\xff\xa0h\x00\xf5?p6\xf0\x16\xf8\xa3h\x0f\xaa)F\x02\xeb\x89\x02[i F\x98G\x05F\xa0hO\xf0\xff2O\xf0\xff3\x00\xf5?p5\xf0e\xff\xa0h\x00!5\xf0\x93\xff,\xe7\x00\xbf\rH\x06\x00kK\x06\x00F\x7f\x06\x00%K\x06\x009K\x06\x00WK\x06\x00\x88'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\xb7\x06\x00\x8fK\x06\x00p\xb5\x04F\x80h\x86\xb0\x01!\x00\xf5\x1ap5\xf0\xff\xfe\xa3h\x1by\x02+ \xd1\x00#\x01\x93\xa3h`h\x9ek\x03h\x1di\x01\xab\x0e\xb3\x10"\x02\xa9\xa8G\x01\x9b\x93\xb1\xa0h\x01!5\xf0b\xff\x01\x9a\x02\xa9 F\xb0G\xa0h\x00!5\xf0Z\xff\xa3h\x9bk\x00+\xe3\xd1 F\xff\xf7r\xfe\xa0ho\xf0\x01\x01\x00\xf5\x1ap\x06\xb0\xbd\xe8p@5\xf0\xdf\xbe\x01"\r\xf1\x03\x01\xa8G\x01\x9b\x00+\xee\xd0\x9d\xf9\x03\x10\x9d\xf8\x030\xa2h\x00)\xc8\xdbQy\x03)\x00\xf2u\x81\xdf\xe8\x11\xf0\x04\x00\xf2\x00\n\x01s\x01\r+W\xd0\n+2\xd0\xb2\xf8h\x12o\xf3\xca\x01\xa2\xf8h\x12\t+\xa0ha\xd0\x14\xd8\x00+\xae\xd0\x08+_\xd0Z\x1c\xafI\xa0h\xd2\xb2\x8a\\\x12\xf0\x97\x0fc\xd0\x01!5\xf0\x1f\xff\x9d\xf8\x03\x10 F\x00\xf0n\xfd\x9b\xe7\x1b+F\xd0\x7f+\xea'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\xd1\xd0\xf8d2\x9b\x07\x93\xd5\x01!5\xf0\x0e\xff\xa3h\xd3\xf8d2\x1e\x07F\xd4 F6\xf0\x0e\xfa\x87\xe7\xd2\xf8h\x12\xc1\xf3\xc7\x01)\xb1\xd2\xf8h\x12\xc1\xf3\xc7\x01\n)\xcf\xd1\xb2\xf8h\x12c\xf3\xca\x01\xa2\xf8h\x12\xa5h\xb5\xf8@\x10\xc9\xb9(F5\xf0\xed\xfe\xe0h6\xf0\x81\xfb\x90I`i5\xf0\x9c\xfe F\xff\xf7\xf7\xfdd\xe7\xd2\xf8h\x12\xc1\xf3\xc7\x01\x00)\xe1\xd0\xd2\xf8h\x12\xc1\xf3\xc7\x01\r)\xda\xe7 F\xff\xf7\r\xfe\xc5\xf8\x14\x03\xe9\xe7\x01#CqO\xe7\xd0\xf8d2L\xe7\xd0\xf8d2\x9a\x07\x7f\xf5H\xaf\x01!5\xf0\xc2\xfe F6\xf0\xe6\xf9@\xe7\xd0\xf8d"\x95\x07\x7f\xf5<\xaf\x01;\x16+?\xf68\xaf\x01\xa2R\xf8#\xf0\xf1J\x01\x00\xb5I\x01\x00\xbdI\x01\x00\xabH\x01\x00\xe5J\x01\x00\xddI\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xe5I\x01\x00\xed'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'I\x01\x00\xc3G\x01\x00\xf5I\x01\x00\xc3G\x01\x00\xffI\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\x03J\x01\x00\xc3G\x01\x00\x19J\x01\x00 F6\xf0\xcd\xf8\x02\xe7 F6\xf0\xc0\xf8 F6\xf0\\\xf8\x18\xb9XI`i5\xf0+\xfe\xa0h\x01!5\xf0q\xfe\x89\xe7 F6\xf0\xc2\xf8\xee\xe6 F6\xf0\xc9\xf8\xea\xe6 F\x00\xf0\xf1\xfc\xe6\xe6\x00! F5\xf0j\xfe\xe1\xe6\x01!\xf9\xe7 F6\xf0\x96\xf8 F5\xf0\xc8\xfd\xa0h\x01!5\xf0S\xfe\xd4\xe6 F\x00\xf0\x1b\xfc\xf6\xe7[+\x02\xd1\x02#Sq\xcb\xe6\xd2\xf8d"\x90\x07\x06\xd5b+\x08\xd1O\xf0\xff1 F\x00\xf0\x90\xfb\xa3h\x00"Zq\xbc\xe6f+\xf9\xd1\x01!\xf4\xe7\x00!Qq\xa2h\xd2\xf8d\x12\x89\x07\x7f\xf5\xb1\xae1;\x1b+?\xf6\xad\xae\x01\xa1Q\xf8#\xf0\x00\xbf\xedJ\x01\x00\xf9J\x01\x00\x11'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'K\x01\x00\xe1J\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xffI\x01\x00\xf5I\x01\x00\xddI\x01\x00\xb5I\x01\x00\xc3G\x01\x00\xe5J\x01\x00\xc3G\x01\x00\xf1J\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xc3G\x01\x00\xfdJ\x01\x00\x03#Sq F6\xf0,\xf8j\xe6\x03#Sq F6\xf0\x1f\xf8d\xe6\x03#Sq\xa0h\xd0\xf8d\x12\x01\xf0\x01\x01\x81\xf0\x01\x015\xf0*\xfeX\xe6\x03#Sq\xa3h\xd3\xf8d2\x9b\x07\x7f\xf5Q\xae\xc3\xe6\x00#\x7f\xe7\x00\xbf\x8e\xb7\x06\x00\rH\x06\x00-\xe9\xf0G\x04F\x80h\x88F\xd0\xf8\x10\x13\x88\xb0\x15F\x1fF\x00)e\xd1\x92FO\xf4FrG\xf0\x84\xfa\xa3h"h\x1a`\xe0h6\xf0\xe8\xfa\xa0h\x00\xf5?p?\xf0\x8b\xf8O\xf4+yO'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\xf4\x1bv\xa0h0D?\xf0H\xfc\xa0h\x00"\x83\x19\x01!HD\x106?\xf0\x16\xfc\xb6\xf5+\x7f\t\xf1\x14\t\xee\xd1\xa0h\x02!\x00\xf5\x1ap5\xf0\x0b\xfd\xa3hP"Z\x85\xa3h\x18"\x1a\x85\xa6h0h5\xf05\xfd\x86\xf8,\x00\xa3h\x18J%\xf04\x05e\xf3\x07\n\xc3\xf8d\xa2`h\x03hAF\x1dh#F\xa8G\x05F\xd0\xb9\xa3h\x01"\x1aq\xd4\xe9\n\x01\x00"\x00#\xcd\xe9\x06#\x0e#\x03\x93\x10\x9b\x04\x95\xcd\xe9\x01s\x00\x94\nKO\xf4\x00b#\xf0\x98\xfa\xa3h\xc3\xf8\x10\x03aj#\xf0\xc8\xf9(F\x08\xb0\xbd\xe8\xf0\x87o\xf0w\x05\xf8\xe7\x00\xbfG\xa6\x04\x00\x8dL\x01\x00\x10\xb5\x83h\x1ay\x01*\x04F\'\xd1\xd3\xf8h"Q\x04\x08\xd5\xd3\xf8d"R\x07\x04\xd4\x01F\xdak\xc0i6\xf0>\xf9\xa0hO\xf0\xff2\x00\xf5?pO\xf0\xff35\xf0\xf3\xfc h5\xf0\xdd'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\xfc\x18\xb1\x08I`i5\xf0\xde\xfc F\xff\xf79\xfc\xa0h\x00\xf5?p5\xf0\x82\xfd\x00 \x10\xbdo\xf0\x85\x00\xfb\xe7\x00\xbf\x0cH\x06\x00-\xe9\xf0A\x04F\x80h\x15F\x00\xf5\x1ap\x00)7\xd0O\xf4\x80A5\xf0\x87\xfc\xa3h\xddc`h\x03h\x00!\x9bh\x98G\x00(K\xd1 F\xff\xf7\xb3\xff\x00(F\xd1$N$O\xdf\xf8\x98\x80\xa0h\x00\xf5+pO\xf0\xff2O\xf0\xff3\x03!$\xf0\x19\xfb\x05F\xa0hO\xf0\xff2O\xf0\xff3\x00\xf5?p\x9d\xb15\xf0\xa7\xfc F+F\x18J\x01!5\xf0\x9e\xff\xa0h\xbd\xe8\xf0A\x00\xf5?p5\xf09\xbdo\xf4\x80A5\xf0^\xfc\xc6\xe75\xf0\x93\xfc2F\x02! F5\xf0e\xfc:F)F F5\xf0`\xfc FBF\x01!5\xf0[\xfc`h\x03h[i\x03\xb1\x98G\xa0h\x00\xf5?p5\xf0\x19\xfd\xbc\xe7\xbd\xe8\xf0\x81['
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\xa6\x04\x00\xa5G\x01\x00\xaaK\x06\x00\x85\xa7\x04\x00\x10\xb5\x08J\x04F\x08!5\xf0t\xfd F\x06J\x08!5\xf0o\xfd F\x04J\x00!\x00\xf0\xb8\xfb\x00 \x10\xbd\xc1K\x06\x00\x15L\x06\x00\xafL\x06\x00\xf8\xb5\x1eF\x17F\x0bF/J\x08!\x04F5\xf0Y\xfdn\xb3;x-J\x08! F5\xf0R\xfd\x01%j\x07\x04\xd1*J\x08! F5\xf0J\xfd\xaeB#\xd9{]%J\x08! F5\xf0B\xfd\x015\x10-\xed\xd1#J\x08! F5\xf0:\xfd\xd6\xb9\x1fJ\x08! F5\xf04\xfd\x01%\xaeB+F\x13\xd8\x1aJ\x08! F5\xf0+\xfd\x19\xe0\x19J\x08! F5\xf0%\xfd\x01%\x16J\x08! F5\xf0\x1f\xfd\xdb\xe7\x00%+F\xfb\\\x13J\xd2\\\x12\xf0\x97\x0f\x08\xbf.#\x11J\x08! F5\xf0\x10\xfd\x015\x10-\x07\xd0k\x07\xd7\xd1\x08J\x08! F5'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'\xf0\x06\xfd\xd1\xe7 F\tJ\xbd\xe8\xf8@\x08!5\xf0\xfe\xbc\xc5L\x06\x00\x0c=\x06\x00\x14=\x06\x00\x16=\x06\x00\x12=\x06\x00\x8f\xb7\x06\x00\x18=\x06\x00\xccL\x06\x00\x13\xb5\x13F\x04F\x00"\x00\x92\nF\x01F\x07H\xf8\xf7x\xf9#i\x1by+\xb1 F\x02\xb0\xbd\xe8\x10@5\xf0\x15\xbd\x02\xb0\x10\xbd\x00\xbf\xe5\xa8\x04\x00-\xe9\xf0O\x85\xb0\x00&\xdf\xf8\xa8\xa1\x01\x91\x07F\x14F\x99F\x06`\xb3F#x\x00+\x00\xf0\xc6\x80\x1a\xf8\x030\x13\xf0\x08\x03\x04\xd0\x04\xf8\x01\xbb\xf3\xe7\x00&\xf1\xe7:hQ\x1c9`\x01\x99A\xf8"@:hJE\x00\xf0\xb2\x80\x1eF!x\x00)\xe3\xd0\xa6\xb9\x0eF#F\x1cF\'.\x04\xd0\\. \xd0".@\xf0\x97\x80 F5\xf0\x04\xfda\x1c\x02F FG\xf0\x82\xf8!x\x00)\xcd\xd0\xb1B\xa0F\x04\xf1\x01\x04\x1b\xd1@F5\xf0\xf4\xfc!F\x02F@'
DEBUG:bleak.backends.corebluetooth.client:Write Characteristic da2e7828-fbce-4e01-ae9e-261174997c48 : b'FG\xf0r\xf8\x98\xf8\x00`DF\x00.\xd9\xd1\xba\xe7 F5\xf0\xe6\xfce\x1c\x02F)F FG\xf0c\xf8'
DEBUG:smpclient.transport.ble:Sent 2475 bytes
DEBUG:smpclient.transport.ble:Waiting for notify on SMP_CHARACTERISTIC_UUID=UUID('da2e7828-fbce-4e01-ae9e-261174997c48')
ERROR:smpclient:Exception in SMPClient: exc_type=<class 'RuntimeError'>, exc_value=RuntimeError('Lock is not acquired.'), traceback=<traceback object at 0x107f6ee80>
DEBUG:smpclient.transport.ble:Disonnecting from 3D10EF7C-3B95-7EAD-315A-84D74250219B
DEBUG:bleak.backends.corebluetooth.CentralManagerDelegate:centralManager_didDisconnectPeripheral_error_
DEBUG:bleak.backends.corebluetooth.CentralManagerDelegate:Peripheral Device disconnected!
WARNING:smpclient.transport.ble:Disconnected from 3D10EF7C-3B95-7EAD-315A-84D74250219B
DEBUG:smpclient.transport.ble:Disconnected from 3D10EF7C-3B95-7EAD-315A-84D74250219B
JPHutchins commented 6 months ago

Is the device resetting at all when this happens or is it happy as far as you can tell?

You can try increasing the timeout arguments to the upload command. I'll look into why a task is getting cancelled, I thought that it would raise a timeout error instead.

bjandre commented 6 months ago

The logs do not show any evidence of a MCU reset/reboot/fault during the upload. I enabled the zephyr mcumgr_smp debug logging and there is no indication of a problem on that end.

mcumgr-web is able to upload the large image, although it does occasionally hang and have to be restarted. I am also able to consistently upload a small dummy image with smpclient.

I'll look into the upload timeout arguments next.

JPHutchins commented 6 months ago

What board / MCU are you using? I think that I can borrow an Arm Mac to see if I can reproduce.

Please consider forking and opening a PR if you find that the example script or anything else needs some changes.

JPHutchins commented 6 months ago

@bjandre It appears to be an asyncio bug causing the nasty traceback, though I suspect the root cause is that the smpclient is not receiving the notify within the 2.5 second timeout.

https://stackoverflow.com/questions/59314875/waiting-on-condition-variable-with-timeout-lock-not-reacquired-in-time

The issue is still open at Python: https://bugs.python.org/issue39032

But the SO poster also included a workaround that I'll look into.

bjandre commented 6 months ago

We're based on an nrf52840 dk, zephyr board nrf52840dk_nrf52840, with upstream zephyr 3.6, not the nordic sdk.

I experimented with the timeout parameter and it didn't help. It seems like a notify is being dropped, but I'm not sure if the issue is on the zephyr or smpclient side. I'm still investigating the root cause.

If we can get a cleaner exception when a notify timeout occurs, that would make it easier to trap at the application level and allow retries.

JPHutchins commented 6 months ago

@bjandre If you don't mind wiping the flash on your DK, can I ask that you run the FW upgrade HW integration test for the nrf52840dk:

python -m examples.ble.upgrade nrf52840dk_nrf52840

And paste the output.

Expected:

Using DUT folder: C:\Users\jp\repos\smpclient\examples\duts\nrf52840dk_nrf52840\ble
Using merged.hex: C:\Users\jp\repos\smpclient\examples\duts\nrf52840dk_nrf52840\ble\a_smp_dut.merged.hex
Flashing the merged.hex...
[ #################### ]   1.150s | Erase file - Done erasing
[ #################### ]   1.689s | Program file - Done programming
[ #################### ]   1.733s | Verify file - Done verifying
Applying system reset.
Run.
A SMP DUT hash: SHA256=215144b99127acb3c66d9ec7540ee454703c3e15db7e12a713ad0f672d63321c
B SMP DUT hash: SHA256=62a8e0864e02761c8498d4380b8104370b028f7d63870833420442753f4558d0
Searching for A SMP DUT...OK
Connecting to A SMP DUT...2024-05-31 11:46:16.965 INFO     smp_characteristic.max_write_without_response_size=495
2024-05-31 11:46:16.965 INFO     self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0

Uploaded 225,130 / 225,130 Bytes | 26.78 KB/s
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=1, sequence=94, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
Confirmed the upload

Marking B SMP DUT for test...
Sending request...OK
Received response: header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=1, sequence=95, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=True, confirmed=False, active=False, permanent=False)] splitStatus=0

Resetting for swap...
Sending request...OK
Received response: header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=2, group_id=0, sequence=96, command_id=5) sequence=0
2024-05-31 11:46:26.196 WARNING  Disconnected from C4:35:E3:1E:09:DB

Searching for B SMP DUT...OK
Connecting to B SMP DUT...2024-05-31 11:46:50.060 INFO     smp_characteristic.max_write_without_response_size=495
2024-05-31 11:46:50.060 INFO     self._max_write_without_response_size=495
OK

Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=1, sequence=98, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=False, permanent=False)] splitStatus=0

Confirmed the swap
2024-05-31 11:46:53.318 WARNING  Disconnected from C4:35:E3:1E:09:DB
bjandre commented 6 months ago

I don't have a nrf52840 devkit with me, but our custom board is compatible.

When I run the upgrade script, it will detect the device by name "A SMP DUT", but it fails to connect by device address:

smpclient % python3 -m examples.ble.upgrade nrf52840dk_nrf52840
Using DUT folder: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble
Using merged.hex: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut.merged.hex
Flashing the merged.hex...
Recovering device. This operation might take 30s.
[ ######               ]   0.000s | Program file - Checking image
[ ###                  ]   0.000s | Check image validity - Initialize device inf
[ ######               ]   0.000s | Check image validity - Check region 0 settin
[ ##########           ]   0.000s | Check image validity - block 1 of 4
[ #############        ]   0.004s | Check image validity - block 2 of 4
[ #################### ]   0.015s | Check image validity - Finished
[ #############        ]   0.000s | Program file - Programming
[ #####                ]   0.000s | Programming image - block 1 of 3
[ ##########           ]   0.200s | Programming image - block 2 of 3
[ #################### ]   0.000s | Programming image - Write successful
[ #################### ]   1.584s | Program file - Done programming                                                    
[ ######               ]   0.000s | Verify file - Check image
[ ###                  ]   0.000s | Check image validity - Initialize device inf
[ ######               ]   0.000s | Check image validity - Check region 0 settin
[ ##########           ]   1.713s | Check image validity - block 1 of 4
[ #############        ]   0.882s | Check image validity - block 2 of 4
[ #################### ]   0.000s | Check image validity - Finished
[ #############        ]   0.000s | Verify file - Verifying
[ #####                ]   0.000s | Verifying image - block 1 of 3
[ ##########           ]   0.204s | Verifying image - block 2 of 3
[ #################### ]   0.000s | Verifying image - Verify successful
[ #################### ]   1.646s | Verify file - Done verifying                                                       
Applying system reset.
Run.
A SMP DUT hash: SHA256=215144b99127acb3c66d9ec7540ee454703c3e15db7e12a713ad0f672d63321c
B SMP DUT hash: SHA256=62a8e0864e02761c8498d4380b8104370b028f7d63870833420442753f4558d0
Searching for A SMP DUT...OK
Connecting to A SMP DUT...Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 146, in <module>
    asyncio.run(main())
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 67, in main
    async with SMPClient(SMPBLETransport(), a_smp_dut.address) as client:
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 153, in __aenter__
    await self.connect()
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 37, in connect
    await self._transport.connect(self._address, timeout_s)
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 84, in connect
    raise SMPBLETransportDeviceNotFound(f"Device '{address}' not found")
smpclient.transport.ble.SMPBLETransportDeviceNotFound: Device '3D10EF7C-3B95-7EAD-315A-84D74250219B' not found

Increasing the connection timeout doesn't help, but modifying upgrade.py:67 to use the device name instead of address allows it to connect.

smpclient % python3 -m examples.ble.upgrade nrf52840dk_nrf52840
Using DUT folder: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble
Using merged.hex: /Users/<user>/projects/research/smpclient-examples/smpclient/examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut.merged.hex
Flashing the merged.hex...
Recovering device. This operation might take 30s.
[ ######               ]   0.000s | Program file - Checking image
[ ###                  ]   0.000s | Check image validity - Initialize device inf
[ ######               ]   0.000s | Check image validity - Check region 0 settin
[ ##########           ]   0.000s | Check image validity - block 1 of 4
[ #############        ]   0.003s | Check image validity - block 2 of 4
[ #################### ]   0.014s | Check image validity - Finished
[ #############        ]   0.000s | Program file - Programming
[ #####                ]   0.000s | Programming image - block 1 of 3
[ ##########           ]   0.201s | Programming image - block 2 of 3
[ #################### ]   0.000s | Programming image - Write successful
[ #################### ]   1.577s | Program file - Done programming                                                    
[ ######               ]   0.000s | Verify file - Check image
[ ###                  ]   0.000s | Check image validity - Initialize device inf
[ ######               ]   0.000s | Check image validity - Check region 0 settin
[ ##########           ]   1.710s | Check image validity - block 1 of 4
[ #############        ]   0.880s | Check image validity - block 2 of 4
[ #################### ]   0.000s | Check image validity - Finished
[ #############        ]   0.000s | Verify file - Verifying
[ #####                ]   0.000s | Verifying image - block 1 of 3
[ ##########           ]   0.205s | Verifying image - block 2 of 3
[ #################### ]   0.000s | Verifying image - Verify successful
[ #################### ]   1.651s | Verify file - Done verifying                                                       
Applying system reset.
Run.
A SMP DUT hash: SHA256=215144b99127acb3c66d9ec7540ee454703c3e15db7e12a713ad0f672d63321c
B SMP DUT hash: SHA256=62a8e0864e02761c8498d4380b8104370b028f7d63870833420442753f4558d0
Searching for A SMP DUT...OK
Connecting to A SMP DUT...2024-05-31 15:02:59.483 INFO     smp_characteristic.max_write_without_response_size=495
2024-05-31 15:02:59.484 INFO     self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b"!QD\xb9\x91'\xac\xb3\xc6m\x9e\xc7T\x0e\xe4Tp<>\x15\xdb~\x12\xa7\x13\xad\x0fg-c2\x1c", bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0

Uploaded 41,606 / 225,130 Bytes | 10.34 KB/s           2024-05-31 15:03:06.221 ERROR    Exception in SMPClient: exc_type=<class 'RuntimeError'>, exc_value=RuntimeError('Lock is not acquired.'), traceback=<traceback object at 0x102305a00>
2024-05-31 15:03:06.226 WARNING  Disconnected from 3D10EF7C-3B95-7EAD-315A-84D74250219B
Traceback (most recent call last):
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 146, in receive
    await self._notify_or_disconnect()
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 215, in _notify_or_disconnect
    done, pending = await asyncio.wait(
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 464, in wait
    return await _wait(fs, timeout, return_when, loop)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 550, in _wait
    await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 147, in <module>
    asyncio.run(main())
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/examples/ble/upgrade.py", line 90, in main
    async for offset in client.upload(b_smp_dut_bin):
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 135, in upload
    response = await self.request(
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/__init__.py", line 51, in request
    frame = await self._transport.send_and_receive(request.BYTES)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 180, in send_and_receive
    return await self.receive()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/<user>/projects/research/smpclient-examples/smpclient/smpclient/transport/ble.py", line 144, in receive
    async with self._notify_condition:  # wait for the header
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 20, in __aexit__
    self.release()
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/locks.py", line 139, in release
    raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
JPHutchins commented 6 months ago

That makes sense, MacOS does not allow the use of BLE MAC addresses: https://github.com/hbldh/bleak/issues/140

This is good that it's so reproducible! I will borrow a Mac and see what I see. I'm assuming this is on Arm.

I'm suspecting that the FW has not actually received all 5 of the SMP packet fragments (2475 / 5 = 495), so it doesn't notify, and then the request times out. I wonder if it is a synchronization issue wherein the loop here:

https://github.com/intercreate/smpclient/blob/26a3a2d266439dc5260ed7b0b53d68f88463a207/smpclient/transport/ble.py#L131-L137

Is not actually waiting for the previous transfers to complete. It would slow things way down, but I wonder if adding an asyncio.sleep(0.050) after each fragment would make it more stable?

bjandre commented 6 months ago

Yes, ARM based Mac.

I tried adding the asyncio.sleep and the results were inconclusive. In general it seemed to make it the same few K into the upload, but on a few runs it did make it noticeably further before hanging.

JPHutchins commented 6 months ago

https://developer.apple.com/documentation/corebluetooth/cbperipheral/writevalue(_:for:type:)

https://stackoverflow.com/questions/65586100/rate-limiting-a-corebluetooth-write-without-response

I'm starting to suspect that Core Bluetooth writeValue returns without guaranteeing completion of transmission.

Here's where Bleak uses the Core Bluetooth API: https://github.com/hbldh/bleak/blob/d45ec90d3cca4113da7c118cac753f441c46cca2/bleak/backends/corebluetooth/PeripheralDelegate.py#L179-L202

Generally, I've noticed BLE being better on MacOS/iOS. I wonder if it's so good that it's "too good for the peripheral" - it's sending the fragmented packets too quickly.

The manual fragmentation that's done is required on Windows - at least with the BLE adapter that I'm using. I wonder if removing the fragmentation and telling the CB API to send the entire 2475B packet would fix it?

JPHutchins commented 6 months ago

May be possible to make Bleak properly async with this: https://developer.apple.com/documentation/corebluetooth/cbperipheraldelegate/peripheralisready(tosendwritewithoutresponse:)/

Or use write with response and/or send all 2475 bytes.

JPHutchins commented 5 months ago

Currently testing from a MacBook Pro 14 Nov 2023 M3 Pro Sonoma 14.5

Using this branch: https://github.com/intercreate/smpclient/tree/fix-%2331/image-upload-match

Strange thing is that I was initially able to reproduce the error. I started fiddling with the timeouts and making some changes to .venv/lib/python3.9/site-packages/bleak/backends/corebluetooth and now I can't reproduce the timeout error!

The test setup is to have an nrf52840dk with the examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut_image_check.merged.hex image on it. Then test uploads like

python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/a_smp_dut.bin

You have to keep switching between a/b because an upload of an image that is already uploaded will simply complete immediately:

python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
Scanning for SMP servers...OK
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:37:08.344 ble:108 INFO     smp_characteristic.max_write_without_response_size=495
2024-06-02 14:37:08.344 ble:132 INFO     self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0

Uploaded 225,130 / 225,130 Bytes | 12.96 KB/s           2024-06-02 14:37:25.895 __init__:150 INFO     Upload complete
2024-06-02 14:37:25.895 __init__:153 INFO     Server reports response.match=True

Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=244, group_id=1, sequence=94, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
2024-06-02 14:37:25.958 ble:228 WARNING  Disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6

python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
Scanning for SMP servers...OK
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:40:49.059 ble:108 INFO     smp_characteristic.max_write_without_response_size=495
2024-06-02 14:40:49.059 ble:132 INFO     self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=244, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0

Uploaded 225,130 / 225,130 Bytes | 471.33 KB/s           2024-06-02 14:40:49.750 __init__:150 INFO     Upload complete
2024-06-02 14:40:49.751 __init__:153 INFO     Server reports response.match=True

Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=244, group_id=1, sequence=3, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'b\xa8\xe0\x86N\x02v\x1c\x84\x98\xd48\x0b\x81\x047\x0b\x02\x8f}c\x87\x083B\x04Bu?EX\xd0', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0
2024-06-02 14:40:49.844 ble:228 WARNING  Disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6

Uploads are slower than I expect. Typically about 15KB/s where Windows gets 25KB/s. MTU is negotiating to 495.

Write with response does not work, I guess it's not setup on the peripheral:

bleak.exc.BleakError: Failed to write characteristic 17: Error Domain=CBATTErrorDomain Code=3 "Writing is not permitted." UserInfo={NSLocalizedDescription=Writing is not permitted.}

Sending the entire packet also does not work (hangs and time out).

By using the upload example script, I have discovered something strange. After a successful upload, the next upload will fail with IMG_MGMT_ERR.FLASH_ERASE_FAILED. If I retry it will succeed. This implies that the failed request is changing the device state in some way that causes the subsequent upload to succeed. Probably not intended behavior. Also likely that repeated uploads doesn't get a lot of coverage!

2024-06-02 14:35:11.822 __init__:174 ERROR    Exception in SMPClient: exc_type=<class 'smpclient.exceptions.SMPUploadError'>, exc_value=SMPUploadError(ImageManagementErrorV1(header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=19, group_id=1, sequence=2, command_id=1), sequence=0, err=Err[IMG_MGMT_ERR](group=<GroupId.IMAGE_MANAGEMENT: 1>, rc=<IMG_MGMT_ERR.FLASH_ERASE_FAILED: 13>))), traceback=<traceback object at 0x10262d7c0>

In Bleak, I've added this to .venv/lib/python3.9/site-packages/bleak/backends/corebluetooth/PeripheralDelegate.py

    def peripheralIsReady_toSendWriteWithoutResponse_(self, peripheral: CBPeripheral, characteristic: CBCharacteristic) -> None:
        logger.debug(type(peripheral), type(characteristic))
        logger.error("Called!")

It registers but it does not get called.

In .venv/lib/python3.9/site-packages/bleak/backends/corebluetooth/client.py I've modified the write_gatt_char() function with a hack to force it to async sleep until it's ready to send another message. The actual fix should be to use the callback mentioned above to set a Future, but I think that it's all a non-issue. If that callback is not called then there is no error!

    async def write_gatt_char(
        self,
        characteristic: BleakGATTCharacteristic,
        data: Buffer,
        response: bool,
    ) -> None:
        value = NSData.alloc().initWithBytes_length_(data, len(data))
        await self._delegate.write_characteristic(
            characteristic.obj,
            value,
            (
                CBCharacteristicWriteWithResponse
                if response
                else CBCharacteristicWriteWithoutResponse
            ),
        )
        while self._peripheral.canSendWriteWithoutResponse() != True:
            logger.error("Not ready to send!")
            await asyncio.sleep(0.010)
        logger.debug(f"Write Characteristic {characteristic.uuid} : {data}")

This results in

Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:47:27.360 ble:108 INFO     smp_characteristic.max_write_without_response_size=495
2024-06-02 14:47:27.361 ble:132 INFO     self._max_write_without_response_size=495
2024-06-02 14:47:27.420 client:336 ERROR    Not ready to send!
OK
Sending request...2024-06-02 14:47:27.481 client:336 ERROR    Not ready to send!
OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0

2024-06-02 14:47:27.542 client:336 ERROR    Not ready to send!
2024-06-02 14:47:27.553 client:336 ERROR    Not ready to send!
2024-06-02 14:47:27.565 client:336 ERROR    Not ready to send!
2024-06-02 14:47:27.576 client:336 ERROR    Not ready to send!
2024-06-02 14:47:27.588 client:336 ERROR    Not ready to send!
Uploaded 2,390 / 225,131 Bytes | 4.19 KB/s           
2024-06-02 14:47:28.112 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.124 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.136 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.147 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.158 client:336 ERROR    Not ready to send!
Uploaded 4,841 / 225,131 Bytes | 6.43 KB/s           
2024-06-02 14:47:28.294 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.306 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.317 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.329 client:336 ERROR    Not ready to send!
2024-06-02 14:47:28.340 client:336 ERROR    Not ready to send!

etc.

So, the canSendWriteWithoutResponse flag does seem to be False for ~50ms after avery message. But even without this delay, calling writeWithoutResponse is not raising any error callbacks. https://developer.apple.com/documentation/corebluetooth/cbperipheral/cansendwritewithoutresponse

JPHutchins commented 5 months ago

Gonna try hard coding the MTU to 244 which seems to be what your Mac negotiated. Still worked OK.

JPHutchins commented 5 months ago

I'm trying moving the DK further away from the Mac, past several walls. This makes the transfer fail, but in a sorta reasonable way:

python -m examples.ble.upload examples/duts/nrf52840dk_nrf52840/ble/b_smp_dut.bin
Scanning for SMP servers...OK
Found 1 SMP servers: [BLEDevice(14551EE2-C7AF-9698-F4FB-7BABAA222CB6, A SMP DUT)]
Connecting to the first SMP server...2024-06-02 14:56:33.552 ble:108 INFO     smp_characteristic.max_write_without_response_size=495
2024-06-02 14:56:33.552 ble:132 INFO     self._max_write_without_response_size=495
OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V1: 1>, flags=<Flag.0: 0>, length=134, group_id=1, sequence=1, command_id=0) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\x10\xd5%\x14\xd6\xa2\x94\x07U\xcfT\x92\xa2\x9a\xb5\xed\x04\xbdwmqQO\x8c\x8b\xfb\xa7\x08J\xf9\xdds', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0

Uploaded 21,998 / 225,130 Bytes | 4.24 KB/s           2024-06-02 14:56:39.814 ble:228 WARNING  Disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6
2024-06-02 14:56:39.815 __init__:174 ERROR    Exception in SMPClient: exc_type=<class 'smpclient.transport.SMPTransportDisconnected'>, exc_value=SMPTransportDisconnected('SMPBLETransport disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6'), traceback=<traceback object at 0x102d44b40>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/JP/repos/smpclient/examples/ble/upload.py", line 73, in <module>
    asyncio.run(main())
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/Users/JP/repos/smpclient/examples/ble/upload.py", line 51, in main
    async for offset in client.upload(fw_file, 2):
  File "/Users/JP/repos/smpclient/smpclient/__init__.py", line 137, in upload
    response = await self.request(
  File "/Users/JP/repos/smpclient/smpclient/__init__.py", line 53, in request
    frame = await self._transport.send_and_receive(request.BYTES)
  File "/Users/JP/repos/smpclient/smpclient/transport/ble.py", line 195, in send_and_receive
    return await self.receive()
  File "/Users/JP/repos/smpclient/smpclient/transport/ble.py", line 161, in receive
    await self._notify_or_disconnect()
  File "/Users/JP/repos/smpclient/smpclient/transport/ble.py", line 244, in _notify_or_disconnect
    raise SMPTransportDisconnected(
smpclient.transport.SMPTransportDisconnected: SMPBLETransport disconnected from 14551EE2-C7AF-9698-F4FB-7BABAA222CB6
JPHutchins commented 5 months ago

Discussing here: https://github.com/hbldh/bleak/discussions/1589

JPHutchins commented 5 months ago

@bjandre Can you try increasing the delay between the packets? And LMK the Mac model and OS version. Thanks!

bjandre commented 5 months ago

I'm using a MacBook Pro, with Apple M1 Pro chip. MacOS Sonoma.

I'm glad you were able to reproduce the issue, at least initially.

I tried changing the asyncio.sleep command we discussed previously with a few values up to 1 sec, but it didn't change anything. If there is a different delay value please let me know where to set it and I will try it.

JPHutchins commented 5 months ago

Can you try changing this line to return self.mtu?

https://github.com/intercreate/smpclient/blob/main/smpclient/transport/__init__.py#L54C32-L54C42

This should prevent SMP packet fragmentation. If it helps, then it broadly confirms the source of the issue.

JP

Update: I can confirm that mcumgr-web does not fragment the packets and instead uploads packets sized to the MTU.

https://github.com/boogie/mcumgr-web/blob/main/js/mcumgr.js#L224