zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
739 stars 588 forks source link

[Bug]: Node-red crashes on a restart with an error that "z-wave driver" is not ready (see paste of logs from NOde-red below) #5757

Closed ReziioAdmin closed 1 year ago

ReziioAdmin commented 1 year ago

What happened?

I have noticed this happening on a few different Hubs in the past few weeks. the Zwave interface becomes unresponsive and then Node-Red dives into a restart. Most times it eventually recovers but I have one Hub right now that will just not get past the "uncaught error".

I am using a Zooz 700 Z-wave dongle. One possibility is that the firmware is 7.17 and the latest version is 7.18.

How to reproduce?

Please list steps to reproduce. If the problem is hardware-specific, please list which devices are affected.

Version

8.0

Node-RED Version

3.0.2

What hardware are you seeing the problem on?

Other Linux

Relevant Z-Wave-JS log as a text file

May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]: 2 May 20:51:12 - [error] ZWaveError: The controller is not yet ready! (ZW0103)
May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]:     at Driver.get controller [as controller] (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:708:10)
May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]:     at Driver.shouldDiscardCC (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:3607:21)
May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]:     at Driver.serialport_onData (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:2895:14)
May 02 20:51:12 rosedale-apt-multi-N200 systemd[1]: nodered.service: Main process exited, code=exited, status=1/FAILURE
May 02 20:51:12 rosedale-apt-multi-N200 systemd[1]: nodered.service: Failed with result 'exit-code'.
marcus-j-davies commented 1 year ago

Thanks @ReziioAdmin,

Can you provide a ZWave JS log? you can do this on the controller node, set it to Debug on the file output. The logfile will be in /home/pi/.node-red

This will provide me with a full list of events, from the driver starting up, to the Node RED crash you have detailed from the system log.

8.1.0 was released yesterday, that came with a new driver version (a few versions ahead of what 8.0 come with), so not sure if its worth trying your luck doing that first? else attach that log file as is, it has some critical debug information inside of it.

ReziioAdmin commented 1 year ago

Marcus

I upgraded to 8.1 and upgraded firmware on the Zooz stick to latest version. I though this solved the issue as the hub booted back up fine with no errors. I just logged in now to set up logging and when I deployed the error returned.

Can I DM you the log file pls - there is probably a few things in the log file that I wouldnt want to be publicly available

ReziioAdmin commented 1 year ago

NVM previous comment. I was looking at the wrong log file. Here is the zwave-js log.
zwave-js.log

marcus-j-davies commented 1 year ago

@AlCalzone, Do you have any idea what could be causing an unhandled exception? is this 700 specific? The log doesn't show anything of interest (to me 😅 ), but the below does from the system log.

May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]: 2 May 20:51:12 - [error] ZWaveError: The controller is not yet ready! (ZW0103)
May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]:     at Driver.get controller [as controller] (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:708:10)
May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]:     at Driver.shouldDiscardCC (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:3607:21)
May 02 20:51:12 rosedale-apt-multi-N200 Node-RED[2746]:     at Driver.serialport_onData (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:2895:14)
May 02 20:51:12 rosedale-apt-multi-N200 systemd[1]: nodered.service: Main process exited, code=exited, status=1/FAILURE
May 02 20:51:12 rosedale-apt-multi-N200 systemd[1]: nodered.service: Failed with result 'exit-code'.

Context of issue above of course.

EDIT it seems to be from (at least that is the pattern I see)

2023-05-03T23:35:44.689Z DRIVER   loading configuration...
2023-05-03T23:35:44.699Z SERIAL « 0x010b00a800011202984000a633                                        (13 bytes)
2023-05-03T23:35:44.717Z SERIAL » [ACK]   
AlCalzone commented 1 year ago

This should already be fixed in the latest driver release.

AlCalzone commented 1 year ago

10.16.0 to be specific: https://github.com/zwave-js/node-zwave-js/blob/master/CHANGELOG.md#10160-2023-04-25 -> https://github.com/zwave-js/node-zwave-js/pull/5698

marcus-j-davies commented 1 year ago

Mmmm..

This is running 10.16

2023-05-03T23:34:13.714Z DRIVER version 10.16.0

AlCalzone commented 1 year ago

Weird, the logs do show 10.16.0, but the line numbers in the error log line up with where those are in 10.15.0 🤔

Just checked the npm package for 10.16 to be sure - its contents are what I'd expect.

marcus-j-davies commented 1 year ago

@ReziioAdmin

Lets update the driver again.

marcus-j-davies commented 1 year ago

Oh hang on @AlCalzone .

The system error log was prior to updating to 8.1 so that would explain it 🤣.

System Error Log : <10.16 ZWave Log : =10.16

Will see what the above brings anyhow.

@ReziioAdmin if you could enable Logging after the update, and include any Node RED crash log output also - if applicable

ReziioAdmin commented 1 year ago

ok - I ran npm update zwave-js "npm list zwave-js output"->

node-red-project@0.0.1 /home/pi/.node-red
└─┬ node-red-contrib-zwave-js@8.1.0
  └── zwave-js@10.17.1
ReziioAdmin commented 1 year ago

Still crashes. NR system log->

May 04 11:39:16 xxx-xxxx Node-RED[7825]: 4 May 11:39:16 - [error] ZWaveError: The controller is not yet ready! (ZW0103)
May 04 11:39:16 xxx-xxxxNode-RED[7825]:     at Driver.get controller [as controller] (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:709:10)
May 04 11:39:16 xxx-xxx Node-RED[7825]:     at Driver.get ownNodeId [as ownNodeId] (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:753:15)
May 04 11:39:16 rosedale-apt-multi-N200 Node-RED[7825]:     at BridgeApplicationCommandRequest.toLogEntry (/home/pi/.node-red/node_modules/zwave-js/src/lib/serialapi/application/BridgeApplicationCommandRequest.ts:114:39)
May 04 11:39:16 xxx-xxx Node-RED[7825]:     at DriverLogger.logMessage (/home/pi/.node-red/node_modules/zwave-js/src/lib/log/Driver.ts:127:28)
May 04 11:39:16 xxx-xxx Node-RED[7825]:     at Driver.serialport_onData (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:2913:22)
May 04 11:39:16 xxx-xxx systemd[1]: nodered.service: Main process exited, code=exited, status=1/FAILURE
May 04 11:39:16 xxx-xxx systemd[1]: nodered.service: Failed with result 'exit-code'.
May 04 11:39:16 xxx-xxx systemd[1]: nodered.service: Consumed 23.579s CPU time.
May 04 11:39:36 xxx-xxx systemd[1]: nodered.service: Scheduled restart job, restart counter is at 13.
ReziioAdmin commented 1 year ago

zwave-js-log->

2023-05-04T15:39:12.652Z DRIVER   version 10.17.1
2023-05-04T15:39:12.653Z DRIVER   
2023-05-04T15:39:12.653Z DRIVER   starting driver...
2023-05-04T15:39:12.700Z DRIVER   opening serial port /dev/ttyUSB0
2023-05-04T15:39:14.232Z DRIVER   serial port opened
2023-05-04T15:39:14.235Z SERIAL » 0x15                                                                 (1 bytes)
2023-05-04T15:39:16.211Z SERIAL « 0x010b00a800011202984000c055                                        (13 bytes)
2023-05-04T15:39:16.227Z SERIAL » [ACK]                                                                   (0x06)
2023-05-04T15:39:51.823Z NDERED   [options] [interview.queryAllUserCodes] Disabled
2023-05-04T15:39:51.835Z NDERED   [options] [disableOptimisticValueUpdate] Disabled
2023-05-04T15:39:51.837Z NDERED   [options] [enableSoftReset] Disabled
2023-05-04T15:39:51.838Z NDERED   [options] [storage.cacheDir] /home/pi/.node-red/zwave-js-cache
2023-05-04T15:39:51.840Z NDERED   [options] [storage.throttle] normal
2023-05-04T15:39:51.842Z NDERED   [options] [securityKeys.S0_Legacy] Encryption key provided          [16 bytes]
2023-05-04T15:39:51.845Z NDERED   [options] [securityKeys.S2_Unauthenticated] Encryption key provided [16 bytes]
2023-05-04T15:39:51.848Z NDERED   [options] [securityKeys.S2_Authenticated] Encryption key provided   [16 bytes]
2023-05-04T15:39:51.849Z NDERED   [options] [securityKeys.S2_AccessControl] Encryption key provided   [16 bytes]
2023-05-04T15:39:51.852Z NDERED   [options] [preferences.scales.temperature] 0x00
2023-05-04T15:39:51.853Z NDERED   [options] [preferences.scales.humidity] 0x00
2023-05-04T15:39:51.855Z NDERED   [FWUS] No key provided - Service may fail!
2023-05-04T15:39:51.857Z NDERED   Initializing driver...
2023-05-04T15:39:51.889Z NDERED   [TELEMETRY] Disabling...
2023-05-04T15:39:51.891Z NDERED   Starting driver...
2023-05-04T15:39:51.895Z DRIVER   ███████╗ ██╗    ██╗  █████╗  ██╗   ██╗ ███████╗             ██╗ ███████╗
                                  ╚══███╔╝ ██║    ██║ ██╔══██╗ ██║   ██║ ██╔════╝             ██║ ██╔════╝
                                    ███╔╝  ██║ █╗ ██║ ███████║ ██║   ██║ █████╗   █████╗      ██║ ███████╗
                                   ███╔╝   ██║███╗██║ ██╔══██║ ╚██╗ ██╔╝ ██╔══╝   ╚════╝ ██   ██║ ╚════██║
                                  ███████╗ ╚███╔███╔╝ ██║  ██║  ╚████╔╝  ███████╗        ╚█████╔╝ ███████║
                                  ╚══════╝  ╚══╝╚══╝  ╚═╝  ╚═╝   ╚═══╝   ╚══════╝         ╚════╝  ╚══════╝
2023-05-04T15:39:51.897Z DRIVER   version 10.17.1
2023-05-04T15:39:51.898Z DRIVER   
2023-05-04T15:39:51.898Z DRIVER   starting driver...
2023-05-04T15:39:51.946Z DRIVER   opening serial port /dev/ttyUSB0
2023-05-04T15:39:53.482Z DRIVER   serial port opened
2023-05-04T15:39:53.484Z SERIAL » 0x15                                                                 (1 bytes)
2023-05-04T15:39:54.159Z SERIAL « 0x011b00a800010d129f03ca0097799d450c64405071dd02620a3b00c077        (29 bytes)
2023-05-04T15:39:54.174Z DRIVER   Dropping message because the driver is not ready to handle it yet.
2023-05-04T15:39:54.177Z SERIAL » [ACK]                                                                   (0x06)
2023-05-04T15:39:55.607Z NDERED   [SHUTDOWN] [RESTART] Cleaning up...
2023-05-04T15:39:55.612Z DRIVER   destroying driver instance...
2023-05-04T15:39:55.696Z DRIVER   loading configuration...
2023-05-04T15:39:55.702Z DRIVER   driver instance destroyed
marcus-j-davies commented 1 year ago

@AlCalzone

Unless there is some really odd thing going on with versioning/npm strangeness, there still seems to be some unhandled exception 🤷

2023-05-04T15:39:12.652Z DRIVER   version 10.17.1
2023-05-04T15:39:12.653Z DRIVER   
2023-05-04T15:39:12.653Z DRIVER   starting driver...
2023-05-04T15:39:12.700Z DRIVER   opening serial port /dev/ttyUSB0
2023-05-04T15:39:14.232Z DRIVER   serial port opened
2023-05-04T15:39:14.235Z SERIAL » 0x15                                                                 (1 bytes)
2023-05-04T15:39:16.211Z SERIAL « 0x010b00a800011202984000c055                                        (13 bytes)
2023-05-04T15:39:16.227Z SERIAL » [ACK]                                                                   (0x06)
------------> Unhandled Exeption Here (Process Crash Here - log below) <------------
2023-05-04T15:39:51.823Z NDERED   [options] [interview.queryAllUserCodes] Disabled
2023-05-04T15:39:51.835Z NDERED   [options] [disableOptimisticValueUpdate] Disabled
2023-05-04T15:39:51.837Z NDERED   [options] [enableSoftReset] Disabled
2023-05-04T15:39:51.838Z NDERED   [options] [storage.cacheDir] /home/pi/.node-red/zwave-js-cache
2023-05-04T15:39:51.840Z NDERED   [options] [storage.throttle] normal
2023-05-04T15:39:51.842Z NDERED   [options] [securityKeys.S0_Legacy] Encryption key provided          [16 bytes]
2023-05-04T15:39:51.845Z NDERED   [options] [securityKeys.S2_Unauthenticated] Encryption key provided [16 bytes]
2023-05-04T15:39:51.848Z NDERED   [options] [securityKeys.S2_Authenticated] Encryption key provided   [16 bytes]
2023-05-04T15:39:51.849Z NDERED   [options] [securityKeys.S2_AccessControl] Encryption key provided   [16 bytes]
2023-05-04T15:39:51.852Z NDERED   [options] [preferences.scales.temperature] 0x00
2023-05-04T15:39:51.853Z NDERED   [options] [preferences.scales.humidity] 0x00
2023-05-04T15:39:51.855Z NDERED   [FWUS] No key provided - Service may fail!

Crash Log

May 04 11:39:16 xxx-xxxx Node-RED[7825]: 4 May 11:39:16 - [error] ZWaveError: The controller is not yet ready! (ZW0103)
May 04 11:39:16 xxx-xxxxNode-RED[7825]:     at Driver.get controller [as controller] (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:709:10)
May 04 11:39:16 xxx-xxx Node-RED[7825]:     at Driver.get ownNodeId [as ownNodeId] (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:753:15)
May 04 11:39:16 rosedale-apt-multi-N200 Node-RED[7825]:     at BridgeApplicationCommandRequest.toLogEntry (/home/pi/.node-red/node_modules/zwave-js/src/lib/serialapi/application/BridgeApplicationCommandRequest.ts:114:39)
May 04 11:39:16 xxx-xxx Node-RED[7825]:     at DriverLogger.logMessage (/home/pi/.node-red/node_modules/zwave-js/src/lib/log/Driver.ts:127:28)
May 04 11:39:16 xxx-xxx Node-RED[7825]:     at Driver.serialport_onData (/home/pi/.node-red/node_modules/zwave-js/src/lib/driver/Driver.ts:2913:22)
May 04 11:39:16 xxx-xxx systemd[1]: nodered.service: Main process exited, code=exited, status=1/FAILURE
May 04 11:39:16 xxx-xxx systemd[1]: nodered.service: Failed with result 'exit-code'.
May 04 11:39:16 xxx-xxx systemd[1]: nodered.service: Consumed 23.579s CPU time.
May 04 11:39:36 xxx-xxx systemd[1]: nodered.service: Scheduled restart job, restart counter is at 13.

@ReziioAdmin

If providing logs in-line (not as an attached file), can you put them between code fences - just makes it easier to read 😇

Log Here

AlCalzone commented 1 year ago

Ok now it looks like a different error.

Side note: You seem to have some nodes reporting pretty frequently if this happens reproducibly in the few seconds it takes to initialize the driver. Might want to look into that and tone it down. This is probably not good for mesh reliability.

ReziioAdmin commented 1 year ago

Thank you for theses inputs. I will use the code fences going forward.

A few other pieces of information which may help.
I have over 50 hubs running the same hardware and identical NR flows and have not had this before when NR just won't restart. There is an ongoing issue where Zwave-JS seems to force a NR restart on a larger network whenever a new device is joined.

I have a suspicion that the error on the particular device is hardware related. Possibly the USB port or the dongle have a some issue. I am going to reload the installation onto a new Hub and see if this resolves anything

marcus-j-davies commented 1 year ago

@AlCalzone,

Have marked this as upstream - if its not, do let me know.

AlCalzone commented 1 year ago

Will try to get this fixed later tonight

ReziioAdmin commented 1 year ago

@AlCalzone Thank you for this fix. All is working again !

AlCalzone commented 1 year ago

Huh, I didn't release it yet

marcus-j-davies commented 1 year ago

Don't count your chickens yet @ReziioAdmin - its more than likely perfect timing with the controller start up and a quite episode in your network - the crash can still occur.

The fix will come in the form of 10.17.2 of the driver (or 10.18.0 if more features are added).

Watch here. https://github.com/zwave-js/node-zwave-js/releases

After which point you can run npm update zwave-js again in the .node-red dir, stopping Node RED in-between.

If updating other hosts to 8.1.0 of the Node, they will automatically get what ever version of the driver is current at that time, so if updating other hosts to 8.1.0 wait until 10.17.2 drops

ReziioAdmin commented 1 year ago

Thanks Marcus. I temporarily resolved the problem by removing the nodes which were calling the zwave driver immediately after a reboot or re-deploy which is probably a preferable design in any event but look forward to the release where the fix is released.

marcus-j-davies commented 1 year ago

Makes perfect sense. But you could wait for the "All nodes ready" event? Before doing any significant network operation

The controller node emits this event, when it's relatively safe to start doing your thing.

Anyhow, the latest driver once released, should address the issue you have seen.

AlCalzone commented 1 year ago

The problem was caused by the driver, not user code.