intercreate / smpclient

Simple Management Protocol (SMP) Client for remotely managing MCU firmware
Apache License 2.0
9 stars 6 forks source link

Example/ble dfu #13

Closed JPHutchins closed 6 months ago

JPHutchins commented 7 months ago

@tomaszduda23 I have received the nRF52DK (accidentally got nRF52832, nRF52840 is on the way), the Adafruit Feather nRF52840 and the Pro Micro nRF52840.

This PR does a bit of housekeeping around mypy. Ignore that 😭.

But it also adds and examples folder and some DUT routines for the nRF52DK and Feather! I hope that you'll be able to test these with your hardware so that we can get to the bottom of any BLE OTA DFU issues.

Here's an example output of the feather upgrade routine. For the sake of simplicity I've tested from Windows, but I'd hope that the BLE from Linux would mostly work that same. I included your workaround for BLE MTU and have mentioned it on another open ticket over at Bleak.

python -m examples.ble.upgrade adafruit_feather_nrf52840                                                                                                           
Using DUT folder: C:\Users\jp\repos\smpclient\examples\duts\adafruit_feather_nrf52840
Flashing the merged.hex...
[ #################### ]   1.140s | Erase file - Done erasing
[ #################### ]   1.581s | Program file - Done programming
[ #################### ]   1.629s | Verify file - Done verifying
Applying system reset.
Run.
A SMP DUT hash: SHA256=a3837de56f67f91d8e07cf87f3dba194ca9284c46cd8946c51bc64f58a26a4f9
B SMP DUT hash: SHA256=f9faba4f03ae60601f5df33d106f3714a7c074f57ad8cb700c011591e97d57f2
Searching for A SMP DUT...OK
Connecting to A SMP DUT...OK
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=134, group_id=<GroupId.IMAGE_MANAGEMENT: 1>, sequence=0, command_id=<ImageManagement.STATE: 0>) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\xa3\x83}\xe5og\xf9\x1d\x8e\x07\xcf\x87\xf3\xdb\xa1\x94\xca\x92\x84\xc4l\xd8\x94lQ\xbcd\xf5\x8a&\xa4\xf9', bootable=True, pending=False, confirmed=True, active=True, permanent=False)] splitStatus=0

Uploaded 223,630 / 223,630 Bytes
Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=<GroupId.IMAGE_MANAGEMENT: 1>, sequence=223, command_id=<ImageManagement.STATE: 0>) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\xa3\x83}\xe5og\xf9\x1d\x8e\x07\xcf\x87\xf3\xdb\xa1\x94\xca\x92\x84\xc4l\xd8\x94lQ\xbcd\xf5\x8a&\xa4\xf9', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'\xf9\xfa\xbaO\x03\xae``\x1f]\xf3=\x10o7\x14\xa7\xc0t\xf5z\xd8\xcbp\x0c\x01\x15\x91\xe9}W\xf2', bootable=True, pending=False, confirmed=False, active=False, permanent=False)] splitStatus=0       
Confirmed the upload

Marking B SMP DUT for test...Sending request...OK
Received response: header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=<GroupId.IMAGE_MANAGEMENT: 1>, sequence=224, command_id=<ImageManagement.STATE: 0>) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\xa3\x83}\xe5og\xf9\x1d\x8e\x07\xcf\x87\xf3\xdb\xa1\x94\xca\x92\x84\xc4l\xd8\x94lQ\xbcd\xf5\x8a&\xa4\xf9', bootable=True, pending=False, confirmed=True, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'\xf9\xfa\xbaO\x03\xae``\x1f]\xf3=\x10o7\x14\xa7\xc0t\xf5z\xd8\xcbp\x0c\x01\x15\x91\xe9}W\xf2', bootable=True, pending=True, confirmed=False, active=False, permanent=False)] splitStatus=0       

Resetting for swap...Sending request...OK
Received response: header=Header(op=<OP.WRITE_RSP: 3>, version=<Version.V0: 0>, flags=<Flag: 0>, length=2, group_id=<GroupId.OS_MANAGEMENT: 0>, sequence=225, command_id=<OSManagement.RESET: 5>) sequence=0

Searching for B SMP DUT...OK
Connecting to B SMP DUT...OK

Sending request...OK
Received response: header=Header(op=<OP.READ_RSP: 1>, version=<Version.V0: 0>, flags=<Flag: 0>, length=244, group_id=<GroupId.IMAGE_MANAGEMENT: 1>, sequence=226, command_id=<ImageManagement.STATE: 0>) sequence=0 images=[ImageState(slot=0, version='0.0.0', image=None, hash=b'\xf9\xfa\xbaO\x03\xae``\x1f]\xf3=\x10o7\x14\xa7\xc0t\xf5z\xd8\xcbp\x0c\x01\x15\x91\xe9}W\xf2', bootable=True, pending=False, confirmed=False, active=True, permanent=False), ImageState(slot=1, version='0.0.0', image=None, hash=b'\xa3\x83}\xe5og\xf9\x1d\x8e\x07\xcf\x87\xf3\xdb\xa1\x94\xca\x92\x84\xc4l\xd8\x94lQ\xbcd\xf5\x8a&\xa4\xf9', bootable=True, pending=False, confirmed=True, active=False, permanent=False)] splitStatus=0       
Confirmed the swap
JPHutchins commented 7 months ago

@tomaszduda23 These are the "super minis" that I got: https://www.tindie.com/products/adz1122/supermini-nrf52840-development-board-for-nicenano/

Problem is that I don't see how to attach a programmer (JLink).

tomaszduda23 commented 7 months ago

You need to solder dwd & clk pins on the bottom. You also need vcc, gnd, reset. I'm using stlink clone.

tomaszduda23 commented 6 months ago

What I'm missing here?

$python -m venv venv
$ . ./venv/bin/activate
$ pip install git+https://github.com/intercreate/smpclient/@27a2f681803d4331634e8317bb6382fcd4140e9
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      error: Multiple top-level packages discovered in a flat-layout: ['smpclient', 'dutfirmware'].

      To avoid accidental inclusion of unwanted files or directories,
      setuptools will not proceed with this build.

      If you are trying to create a single distribution with multiple packages
      on purpose, you should not rely on automatic discovery.
      Instead, consider the following options:

      1. set up custom discovery (`find` directive with `include` or `exclude`)
      2. use a `src-layout`
      3. explicitly set `py_modules` or `packages` with a list of names

      To find more information, look for "package discovery" on setuptools docs.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
JPHutchins commented 6 months ago

What I'm missing here?

IDK, I've never tried to install a branch with pip - wild stuff!

You should update your fork to get all the updates from the origin and then checkout this branch on your fork.

JPHutchins commented 6 months ago

Re: the specific error, the pyproject.toml will deal with that https://github.com/intercreate/smpclient/blob/9279cd7a6c5e27579552df88f8ff4dbe78b0e5ec/pyproject.toml#L11-L13

tomaszduda23 commented 6 months ago

It fix the error https://github.com/intercreate/smpclient/pull/18.

  1. The uploads gets stuck forever.
    ImageState(slot=0, version='0.0.0', image=None, hash=HashBytes('8E2A492C614EA37998B105C93293DC234C341CDB2803CB242E7FC1D775DF75A7'), bootable=True, pending=False, confirmed=True, active=True, permanent=False)
    Uploading: [=========================                                   ] 41%
  2. The upload is slower than in https://github.com/intercreate/smpclient/pull/4. It could be faster if MCUMgrParametersRead is added :smiley:
JPHutchins commented 6 months ago

It fix the error #18.

  1. The uploads gets stuck forever.
ImageState(slot=0, version='0.0.0', image=None, hash=HashBytes('8E2A492C614EA37998B105C93293DC234C341CDB2803CB242E7FC1D775DF75A7'), bootable=True, pending=False, confirmed=True, active=True, permanent=False)
Uploading: [=========================                                   ] 41%
  1. The upload is slower than in Bug fixing for zephyr #4. It could be faster if MCUMgrParametersRead is added 😃

For me, MCUMgrParametersRead is returning buf_size=2475 buf_count=4. When I tried to send packets of size 2475, Windows complained - maybe the Windows driver doesn't support that kind of fragmentation? Context here: https://github.com/intercreate/smpclient/pull/4#discussion_r1545520075

My upload speed on Windows was about 10KBps (MTU of 512). Using Nordic's Device Manager app I've seen up to 16KBps. Are you setting the MTU to the 2475 value and then letting the transport fragment? What kinds of speeds are you seeing?

Either way it looks like I need to test from Linux!

JPHutchins commented 6 months ago

@tomaszduda23 Another relevant note is that Zephyr would like for us to add Bumble as an alternate backend to Bleak: https://github.com/zephyrproject-rtos/zephyr/issues/70871#issuecomment-2044300001

Could be interesting, though perhaps not particularly user friendly.

tomaszduda23 commented 6 months ago

For me, MCUMgrParametersRead is returning buf_size=2475 buf_count=4. When I tried to send packets of size 2475, Windows complained - maybe the Windows driver doesn't support that kind of fragmentation? Context here: #4 (comment)

2475 is correct value. You still need to obey MTU though. You create 2475 packet and slice it during sending. The header is send only once.

You could try to pull my version of esphome (not sure if it works on windows). 315fbb0f643fb7123a807a9926e71e1f9c597397 should work with ble and cdc (you need to disable logging for cdc though).

JPHutchins commented 6 months ago

2475 is correct value. You still need to obey MTU though. You create 2475 packet and slice it during sending. The header is send only once.

Got it, I can reimplement in this way.

  1. The uploads gets stuck forever.

Any ideas on this? What board are you testing with and which script was running?

tomaszduda23 commented 6 months ago

Any ideas on this? What board are you testing with and which script was running?

I'm using this like that https://github.com/tomaszduda23/esphome/blob/315fbb0f643fb7123a807a9926e71e1f9c597397/esphome/zephyr_tools.py#L152

with following config

---
nrf52:
  board: adafruit_feather_nrf52840

esphome:
  name: nrf52-test-nrf

switch:
  - platform: gpio
    pin:
      number: 15
      inverted: true
      mode:
        output: true
    id: gpio_15
    restore_mode: RESTORE_DEFAULT_OFF

interval:
  - interval: 500ms
    then:
      - switch.toggle: gpio_15

output:
  - platform: gpio
    pin:
      number: 14
      inverted: true
      mode:
        output: true
    id: rest_gpio

dfu:
  reset_output: rest_gpio

ota:
  - platform: zephyr_mcumgr
    usb_cdc: True
    on_begin:
      then:
        - logger.log: "OTA start"
    on_progress:
      then:
        - logger.log:
            format: "OTA progress %0.1f%%"
            args: ["x"]
    on_end:
      then:
        - logger.log: "OTA end"
    on_error:
      then:
        - logger.log:
            format: "OTA update error %d"
            args: ["x"]
    on_state_change:
      then:
        - if:
            condition:
              lambda: return state == ota::OTA_STARTED;
            then:
              - logger.log: "OTA start"

zephyr_ble_server:

zephyr_ble_nus:
  log: true

zephyr_debug:
JPHutchins commented 6 months ago

I'm using this like that https://github.com/tomaszduda23/esphome/blob/315fbb0f643fb7123a807a9926e71e1f9c597397/esphome/zephyr_tools.py#L152

with following config

Well that's simple enough! Do you suspect that it's caused by the lack of timeouts on the BLE requests? Specifically, if the smpclient sends a message and does not receive a notify from the SMP server, it could wait forever here https://github.com/intercreate/smpclient/blob/27a2f681803d4331634e8317bb6382fcd4140e92/smpclient/transport/ble.py#L101 or on any other notify.wait().

I'd like for you to try running the samples described here: https://github.com/intercreate/smpclient/tree/example/ble-dfu/examples/ble

These are simple HW integration tests that can help to prevent regressions. If you don't have a Feather or NRF52DK then LMK the board you do have and I can try to add a build for it!

The tests work by having two mostly identical FWs and I'd be happy to add a test for the ESPHome FW, but let's start with "regular Zephyr" as a simpler example.

tomaszduda23 commented 6 months ago

Do you suspect that it's caused by the lack of timeouts on the BLE requests?

it can be one of the problems. In my old PR I also had to add code to handle disconnect. Perhaps it was one of the reasons.

I'd like for you to try running the samples described here: https://github.com/intercreate/smpclient/tree/example/ble-dfu/examples/ble

We already know that it does not work in some conditions :smiley: I would solve those issue first.

tomaszduda23 commented 6 months ago

btw I could prepare 2x images for nrf52840 than you can tests upload on your side also.

JPHutchins commented 6 months ago

btw I could prepare 2x images for nrf52840 than you can tests upload on your side also.

Yes, this would be perfect! If possible, have the device names be A SMP DUT and B SMP DUT. If that's not possible then I can add some args to the test script. Regardless, having unique device names is a really easy way to test for successful upload and swap.

tomaszduda23 commented 6 months ago

I'm getting following error

/venv/lib/python3.10/site-packages/smpclient/transport/serial.py", line 8, in <module>
    from typing import Final, override
ImportError: cannot import name 'override' from 'typing' (/usr/lib/python3.10/typing.py)
JPHutchins commented 6 months ago

@tomaszduda23 I finally tested the McuMgr parameters buf_size optimization and it's working nicely, thanks! Getting up to 20KBps. I'd like to merge this now as a starting point. Then update and merge the Python3.8/3.9 compatibility. Then add a new branch with the tests using the ESPHome FW images instead of vanilla Zephyr images.

I'll fix the 3.10 compat real quick!

JPHutchins commented 6 months ago

Changed my mind and merged the 3.8/3.9 compatibility first. Should be easy to rebase and fix this PR.

tomaszduda23 commented 6 months ago

I've just tested BLE update 3 times and it works perfect :+1:

JPHutchins commented 6 months ago

Cool! I should be able to have this on PyPi in a few hours. Still a lot of work to be done for overall reliability, but nice to make some progress!