zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
750 stars 600 forks source link

Do not consider controller as jammed when `Fail` status is returned with positive transmit ticks #6199

Closed ridizy closed 1 year ago

ridizy commented 1 year ago

Is your problem within Home Assistant (Core or Z-Wave JS Integration)?

NO, my problem is NOT within Home Assistant or the ZWave JS integration

Is your problem within Z-Wave JS UI (formerly ZwaveJS2MQTT)?

NO, my problem is NOT within Z-Wave JS UI

Checklist

Describe the bug

What causes the bug? I upgraded my docker container for zwavejsui today to 8.23.0. I noticed many of my devices eventually went into an unknown state. I tried rolling back to various versions as far back as 8.22.0

What do you observe? I noticed many of my devices eventually went into an unknown state. I tried rolling back to various versions as far back as 8.22.0 and on some versions the nodes are marked dead instead of unknown

What did you expect to happen? I expected to update to complete and my z-wave network to continue to function as it has before

Device information

Manufacturer: Model name: Node ID in your network:

How are you using node-zwave-js?

Which branches or versions?

zwave-js-ui: 8.23.0.73606d4 zwave-js: 11.12.0

Did you change anything?

yes (please describe)

If yes, what did you change?

A little while after updating zwavejsui, I did an OTW update of Inovelli light switch. That succeeded and I went to do the next one when I noticed the issue.

Did this work before?

Yes (please describe)

If yes, where did it work?

This z-wave network has been running without issues for a few months. I usually update my containers every few weeks.

Attach Driver Logfile

zwavejs_2023-08-21.log

gsemet commented 1 year ago

Pretty similarly, all my mains powered devices seemed to have being disconfigured, still appearing but with no configuration (vundefined).

AlCalzone commented 1 year ago

I'll need to see the driverlog on loglevel "debug" to know for sure, but this looks like a combination of the stick acting up/becoming unresponsive and some node flooding the network.

As a quick troubleshoot, try unplugging and re-plugging the stick.

gsemet commented 1 year ago

I had to cold restart the server 3 times until it went back to normal.

ridizy commented 1 year ago

Thanks for the tips. I did try unplugging and re-plugging the stick and also a cold restart. Neither seemed to help.

Here is the the driver log with it correctly set to "debug" this time.
zwavejs_2023-08-22.log

As a troubleshooting step, I did try to re-interview Node 4 yesterday and that seems to still be having an issue?

AlCalzone commented 1 year ago

@ridizy Looks like there is some problem trying to reach node 4. Pinging it fails after 27 seconds (!) with a status that normally indicates that the controller is jammed by too much traffic (which isn't the case here).

The difference in the newest version is that Z-Wave JS thinks this is a temporary issue with the controller and keeps retrying, as opposed to marking the node as dead.

Any clue what's going on with node 4 that could cause this? Maybe that one needs to be power cycled.

ridizy commented 1 year ago

Thanks @AlCalzone. It was one of the first ones marked as "unknown" yesterday, so I tried to re-interview it (not sure if that was smart).

It's a light switch and I've tried pulling the air gap on it a couple times. Can I mark it "dead" somehow to get past the issue and then come back to troubleshoot it later?

AlCalzone commented 1 year ago

I'm afraid you'll have to go back to v8.22.3 until I've fixed this.

ridizy commented 1 year ago

Thank you @AlCalzone. I've rolled back to v8.22.3 and it's not getting hung on node 4 and I was able to remove it. However, it is marking most of the rest, but not all, of the nodes dead.

I also tried a cold reboot of the host, unplugging and re-plugging the stick and a network heal.

Here is an updated driver log file: zwavejs_2023-08-22.log

AlCalzone commented 1 year ago

The ping to node 2 goes fine, but all the others fail with the same problem as node 4 now.

Just to exclude a software issue, you could go back all the way to v8.21.2, which is the version before I switched node IDs used in commands from 8 to 16 bit. Although if your controller had a problem with that, the ping to node 2 shouldn't work either.

If that's not it, maybe all your problematic nodes are routing through a single point of failure.

Did you follow these recommendations? https://zwave-js.github.io/node-zwave-js/#/troubleshooting/connectivity-issues?id=general-troubleshooting

ridizy commented 1 year ago

I rolled back to v8.21.2 and it didn't make a difference. I then reset one of the nodes that had an issue in the past (node 5) and I was able to ping all the nodes. I went back to v8.22.3 and everything seems to have returned to normal.

Thank you so much for your assistance and patience!

AlCalzone commented 1 year ago

I'll keep the issue open because there's still the infinite loop in this case I need to address.