zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
739 stars 588 forks source link

FW update on Aeotec Multisensor 6 #5646

Open bostjane opened 1 year ago

bostjane commented 1 year ago

Hi,

I've tried to update FW on Aeotec Multisensor 6. Aeotec support gave me instructions how to make update via Zwave JS UI, but update fails after few % of progress. Version of Zwave JS UI is 8.13.0, controller in 3m away from sensor. Update was also very slow. Any ideas what could be the problem?

Log: 2023-04-08T10:46:02.461Z - firmware update progress Arg 0: 88 Arg 1: 6554 Arg 2: └─currentFile: 1 └─totalFiles: 1 └─sentFragments: 88 └─totalFragments: 6554 └─progress: 1.34

2023-04-08T10:46:32.472Z - firmware update finished Arg 0: -1 Arg 1:

Arg 2: └─success: false └─status: -1 └─reInterview: false

robertsLando commented 1 year ago

Please make a driver log, loglevel debug and attach it here as a file (drag & drop into the text field).

robertsLando commented 1 year ago

BTW Status -1 means error - timeout means no response is received from device, @AlCalzone any clue what could cause that?

bostjane commented 1 year ago

I tried many things and here are findings so far.

It turns out that first problem was the device's signal strength. Even only 3m of distance without obstacles and no problems with practical issues (movement was sent to controller quickly) made it impossible to update device. Health showed 4/10. When moving it really close to controller and when health showed 10/10, update went through successfully for 1 device. Then I tried to update other sensors in the same way, they all had 10/10 health and never succeeded again. They were all interrupted, tried many times and got to different %. The log is from one of attempts.

zwavejs_2023-04-09.zip

When successful, update took 90 minutes. Is this normal, I would say it's rather long? I also noted, that when update failed, I could not retry, because it said update is already in progress although it has failed already a long time ago. FW update process did not clean-up correctly?

robertsLando commented 1 year ago

I see a ton of errors like:

2023-04-09T14:12:07.404Z DRIVER « [Node 021] [REQ] [ApplicationCommand]
                                  └─[SecurityCCCommandEncapsulation] [INVALID]
                                      error: Nonce 0x21 expired, cannot decode security encapsulated command.

Dunno if that could be the problem. Also what stick are you using? If you are using an usb stick please try to use an extension cord, it increeses connectivity a lot!

For other problems we need @AlCalzone to look at those logs

AlCalzone commented 1 year ago

I can't make out anything specific, but a few observations: A couple of nodes' signal is really close to the background noise level. Others have very good, even others are communicating using the absolutely lowest speed possible. If you haven't already, put your stick on a USB extension.

Node 21 is using Security S0. This means every message requires a nonce request by the sender, a response to that from the destination, then the actual message will be sent. The update fails due to the device not responding to the nonce request when the driver sends a packet. Probably got lost, but the device retransmits its request for the firmware fragment using the same encryption state as before, which the driver won't accept.

We generally don't recommend using S0 for devices that don't need it, especially not chatty ones like multisensors. If the USB extension doesn't help, I guess your best bet is to exclude the device, and re-include it without encryption.

bostjane commented 1 year ago

I am using an extension USB cord, making controller away from computer and metal objects. To make radio transmission problems to lowest possible degree I also tried to shut down all devices in the house that use the same frequency band. The controller is ZWave.me UZB stick FW: v5.25, SDK: v6.71.1

If nonce got lost I wonder if it would be possible to make changes in code to be more robust and prone to such conditions. I'm not ZWave expert, but what about more retries of the whole firmware fragment sending? Meaning not just the actual message retransmit, but the whole nonce-response-message cycle?

AlCalzone commented 1 year ago

I guess we could retry. But ultimately the firmware update process is device-driven. It asks the driver for a fragment and the driver sends it. I don't think just re-transmitting the last request (which it does now) is correct, since the specs forbid accepting a command with a previous nonce.

bostjane commented 1 year ago

If firmware update process is device driven then there is probably not much that can be done. So the procedure goes something like this. Driver initiates firmware update and than device starts requesting the fragments by numbers and driver send the fragment data as response to those messages. Each of those messages (fragment request from device to driver and fragment data from driver to device) is actually a 3 packet process - nonce request, nonce response and actual message/data/command.

If I understood right, the problem is that driver could not decrypt re-transmitted fragment response as nonce has already expired as such message has already been decrypted for the first time. As specs forbid accepting message with previous nonce, the problem is probably on the device side as it re-transmitted fragment request without re-transmitting the new nonce request. Could driver detect that and try to re-transmit fragment data together with new nonce request/response?

My suggestion is based on very little knowledge of ZWave specs, I don't even know if message decryption is done by controller itself (behind Serial API) or is it done by software (Zwave JS).

AlCalzone commented 1 year ago

Almost correct. This is how it normally goes

sequenceDiagram
    participant Z as Z-Wave JS
    participant D as Device

    Z-->>D: NonceGet
    D-->>Z: NonceReport
    Z->>D: Initiate firmware update [encrypted]

    D-->>Z: NonceGet
    Z-->>D: NonceReport
    D->>Z: OK [encrypted]

    loop For each fragment
        D-->>Z: NonceGet
        Z-->>D: NonceReport
        D->>Z: Request fragment no. X [encrypted]

        Z-->>D: NonceGet
        D-->>Z: NonceReport
        Z->>D: Fragment no. X [encrypted]
    end

This is what happens here:

sequenceDiagram
    participant Z as Z-Wave JS
    participant D as Device

    Z-->>D: NonceGet
    D-->>Z: NonceReport
    Z->>D: Initiate firmware update [encrypted]

    D-->>Z: NonceGet
    Z-->>D: NonceReport
    D->>Z: OK [encrypted]

    note over Z,D: ... other fragments

    D-->>Z: NonceGet
    Z-->>D: NonceReport "ABC"
    D->>Z: Request fragment no. X [encrypted with nonce  "ABC"]

    Z-->>D: NonceGet
    D--xZ: NonceReport (missing)

    note over Z,D: The device retries:

    D--xZ: Request fragment no. X [encrypted with nonce  "ABC"]

    note over Z,D: this message is discarded (nonce reuse)

En-/Decryption is handled by Z-Wave JS, but the re-transmitted encrypted request is discarded before it is decrypted due to the expired nonce. The driver never knows what's in there, so we can't just send the fragment in response.

We can work around this by retrying the nonce request. This will probably make things more resilient, but no guarantees that it will actually fix the problem.

bostjane commented 1 year ago

Thanks for detailed explanation what is going on. I think your suggestion might work, of course no guaranties, probably also depends on device's retry mechanism implementation. You said specs forbids decrypting message with expired nonce. As I understand here is not what this proposal is about. You'll still discard the re-transmitted fragment X request from device, but since ZWave-JS state machine is in the loop for sending fragment X, I don't see any specs violation. ZWave-JS should simply re-transmit NonceGet in case NonceReport is not received.

If there is a dilemma, whether discarded message was actually request for fragment X or X+1 and device doesn't know which fragment just got, there must be a mechanism to prevent sending same fragment twice. Or if it might breaks anything other, what about keeping track of previous (encrypted) messages for each device, together with nonce (what you already obviously do)? So when Zwave JS get's encrypted message with expired nonce it could compare it with previous. In case previous encrypted message is the same as current one, that could be only the info for state machine to initiate retry - so as you don't really decrypt and/or "use" the re-transmitted message with expired nonce, that is might not be spec violation...I known, it's a thin ice... Again, here are just my suggestions and could be quite wrong as don't know how ZWave JS works in details.

AlCalzone commented 1 year ago

ZWave-JS should simply re-transmit NonceGet in case NonceReport is not received.

That's what I meant.