zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
755 stars 613 forks source link

S2: checkLifelineHealth -> ZWaveError: The node failed to decode the message. (ZW1405) #5120

Open marcus-j-davies opened 2 years ago

marcus-j-davies commented 2 years ago

Is your problem within Home Assistant (Core or Z-Wave JS Integration)?

NO, my problem is NOT within Home Assistant or the ZWave JS integration

Is your problem within Z-Wave JS UI (formerly ZwaveJS2MQTT)?

NO, my problem is NOT within Z-Wave JS UI

Checklist

Describe the bug

Im not entirely sure where this lies:

Device information

My Test device: Ring Keypad Gen 1 (Node 32) https://devices.zwave-js.io/?jumpTo=0x0346:0x0101:0x0201:0.0

How are you using node-zwave-js?

Which branches or versions?

Screenshot 2022-09-26 at 17 02 51

Did you change anything?

no

If yes, what did you change?

No response

Did this work before?

Yes (please describe)

If yes, where did it work?

Previously I did get results - they were shockingly bad! given its a FLiRS device 😅 but I did at least get results instead of an exception.

Attach Driver Logfile

Check Begins here (1 round) 2022-09-26T15:51:50.326Z NDERED « [API: DriverAPI] [Method: checkLifelineHealth]

Fails here 2022-09-26T15:51:53.410Z NDERED [ERROR] [INPUT] The node failed to decode the message. (ZW1405)

zwave.log

zwave-js-bot commented 2 years ago

👋 Hey @marcus-j-davies!

It looks like you attached a logfile, but its filename doesn't look like it a driver log that came from Z-Wave JS. Please make sure you upload the correct one.

AlCalzone commented 2 years ago

So, this looks like the SPAN (shared S2 encryption state) suddenly goes out of sync and the device really doesn't want to re-sync:

Z-Wave JS requests status of the lifeline check, node responds (although out of sequence):

2022-09-26T15:51:53.219Z DRIVER » [Node 032] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      15
                                  └─[Security2CCMessageEncapsulation]
                                    │ sequence number: 231
                                    └─[PowerlevelCCTestNodeGet]
2022-09-26T15:51:53.220Z SERIAL « [ACK]                                                                   (0x06)
2022-09-26T15:51:53.227Z SERIAL « 0x0104011301e8                                                       (6 bytes)
2022-09-26T15:51:53.227Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.228Z DRIVER « [RES] [SendData]
                                    was sent: true
2022-09-26T15:51:53.245Z SERIAL « 0x011a00040020129f03490069f9b6b90f4252f6fdaff2f5ff277e0083          (28 bytes)
2022-09-26T15:51:53.246Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.248Z DRIVER « [Node 032] [REQ] [ApplicationCommand]
                                  └─[Security2CCMessageEncapsulation]
                                    │ sequence number: 73
                                    └─[PowerlevelCCTestNodeReport]
                                        test node id:        1
                                        status:              Success
                                        acknowledged frames: 10
2022-09-26T15:51:53.249Z CNTRLR « [Node 032] received expected response prematurely, remembering it...
2022-09-26T15:51:53.272Z SERIAL « 0x011800130f000005007e7f7f7f7f010103000000000201000080              (26 bytes)
2022-09-26T15:51:53.273Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.274Z DRIVER « [REQ] [SendData]
                                    callback id:            15
                                    transmit status:        OK, took 50 ms
                                    routing attempts:       1
                                    protocol & route speed: Z-Wave, 40 kbit/s
                                    ACK RSSI:               Receiver saturated
                                    ACK channel no.:        1
                                    TX channel no.:         1
2022-09-26T15:51:53.277Z CNTRLR   [Node 032] At Normal Power, 10/10 pings were acknowledged...

Next command, node doesn't understand, sends Nonce Report instead to re-sync:

2022-09-26T15:51:53.286Z DRIVER » [Node 032] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      16
                                  └─[Security2CCMessageEncapsulation]
                                    │ sequence number: 232
                                    └─[SupervisionCCGet]
                                      │ session id:      50
                                      │ request updates: true
                                      └─[PowerlevelCCTestNodeSet]
                                          test node id:     1
                                          power level:      -1 dBm
                                          test frame count: 10
2022-09-26T15:51:53.288Z SERIAL « [ACK]                                                                   (0x06)
2022-09-26T15:51:53.295Z SERIAL « 0x0104011301e8                                                       (6 bytes)
2022-09-26T15:51:53.295Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.296Z DRIVER « [RES] [SendData]
                                    was sent: true
2022-09-26T15:51:53.310Z SERIAL « 0x011c00040020149f024a0126e910cf0b04ef97d3594090e1ee128e7e00d5      (30 bytes)
2022-09-26T15:51:53.311Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.312Z DRIVER « [Node 032] [REQ] [ApplicationCommand]
                                  └─[Security2CCNonceReport]
                                      sequence number:  74
                                      SOS:              true
                                      MOS:              false
                                      receiver entropy: 0x26e910cf0b04ef97d3594090e1ee128e
2022-09-26T15:51:53.313Z CNTRLR « [Node 032] received expected response prematurely, remembering it...
2022-09-26T15:51:53.338Z SERIAL « 0x0118001310000005007e7f7f7f7f01010300000000020100009f              (26 bytes)
2022-09-26T15:51:53.339Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.340Z DRIVER « [REQ] [SendData]
                                    callback id:            16
                                    transmit status:        OK, took 50 ms
                                    routing attempts:       1
                                    protocol & route speed: Z-Wave, 40 kbit/s
                                    ACK RSSI:               Receiver saturated
                                    ACK channel no.:        1
                                    TX channel no.:         1

Z-Wave JS generates a new shared state using the nonce and its own, then repeats the message while telling the node its nonce, so it can do the same

2022-09-26T15:51:53.342Z CNTRLR   [Node 032] failed to decode the message, retrying with SPAN extension...
2022-09-26T15:51:53.346Z SERIAL » 0x012f001320289f03e901124166548b2afe3fdba3f7947b1f4b5685e303a2510ad (49 bytes)
                                  416ca8c275009a73225376c2aaa2511e5
2022-09-26T15:51:53.347Z DRIVER » [Node 032] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      17
                                  └─[Security2CCMessageEncapsulation]
                                    │ sequence number: 233
                                    │ extensions:
                                    │ · type: SPAN
                                    │   sender EI: 0x66548b2afe3fdba3f7947b1f4b5685e3
                                    └─[SupervisionCCGet]
                                      │ session id:      50
                                      │ request updates: true
                                      └─[PowerlevelCCTestNodeSet]
                                          test node id:     1
                                          power level:      -1 dBm
                                          test frame count: 10
2022-09-26T15:51:53.350Z SERIAL « [ACK]                                                                   (0x06)
2022-09-26T15:51:53.358Z SERIAL « 0x0104011301e8                                                       (6 bytes)
2022-09-26T15:51:53.358Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.359Z DRIVER « [RES] [SendData]
                                    was sent: true

Node immediately responds again that it cannot decode

2022-09-26T15:51:53.375Z DRIVER « [Node 032] [REQ] [ApplicationCommand]
                                  └─[Security2CCNonceReport]
                                      sequence number:  75
                                      SOS:              true
                                      MOS:              false
                                      receiver entropy: 0xaceb18c55921bb4950f898b816fd1a94
2022-09-26T15:51:53.376Z CNTRLR « [Node 032] received expected response prematurely, remembering it...
2022-09-26T15:51:53.405Z SERIAL « 0x0118001311000005007e7f7f7f7f01010300000000020100009e              (26 bytes)
2022-09-26T15:51:53.405Z SERIAL » [ACK]                                                                   (0x06)
2022-09-26T15:51:53.406Z DRIVER « [REQ] [SendData]
                                    callback id:            17
                                    transmit status:        OK, took 50 ms
                                    routing attempts:       1
                                    protocol & route speed: Z-Wave, 40 kbit/s
                                    ACK RSSI:               Receiver saturated
                                    ACK channel no.:        1
                                    TX channel no.:         1
2022-09-26T15:51:53.408Z CNTRLR   [Node 032] failed to decode the message after re-transmission with SPAN extens
                                  ion, dropping the message.

... although it should now have all the information to decrypt. Also, it sends another Nonce right afterwards.

Maybe this is just a connectivity issue and it responds to delayed/re-transmitted messages, which have a different SPAN? Notice how all the responses from the node come before the ACK (the lines with DRIVER « [REQ] [SendData]). That is definitely not normal.

marcus-j-davies commented 2 years ago

Cool.

I Wasn't sure if my report was a device specific problem, if @fabnavigator doesn't deliver any logs - I see no reason to keep this open, if my issue is isolated to the device 😅

fabnavigator commented 2 years ago

Here is the log for the failure I'm seeing.

zwave-js.log

AlCalzone commented 2 years ago

DRIVER version 10.0.0-beta.4

@fabnavigator you should to update to latest before doing any further tests involving this

fabnavigator commented 2 years ago

@marcus-j-davies Shouldn't I have the latest if I'm running your latest debug version?

marcus-j-davies commented 2 years ago

@fabnavigator Then you Haven't updated to the release version 😅

v8 was released the other day (with Driver version 10.2.0) https://github.com/zwave-js/node-red-contrib-zwave-js/releases/tag/v8.0.0

EDIT It should be in the palette as an update

fabnavigator commented 2 years ago

@marcus-j-davies Got it. I'm at v8 now. I kicked off my normal health check. If you don't hear back from me, assume all is well.

fabnavigator commented 2 years ago

Same problem as before. Log file attached. zwave-js.log

AlCalzone commented 2 years ago

Ok this one is a little bit different. 1 second after starting the healthcheck, the driver queries the node to update the status. At the same time the health check status is received, causing a collision, which causes the query to be re-transmitted, but now with the wrong/outdated SPAN. I've raised https://github.com/zwave-js/node-zwave-js/issues/5126 to track this. We can actually go a step further in this case, take a shortcut, and not re-transmit the query at all, since the received report is the expected response to it (accidentally): https://github.com/zwave-js/node-zwave-js/issues/5127