zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
750 stars 599 forks source link

Detect an unresponsive stick and reset it #2723

Closed cinadr closed 1 year ago

cinadr commented 3 years ago

Version

Checklist:

Build/Run method

zwavejs2mqtt version: 4.4.0.5238242 zwave-js version: 7.7.0

Describe the bug

After a while the zwavejs.log shows

CNTRLR   Failed to execute controller command after 2/3 attempts. Scheduling next try in 1100 ms.

and nothing works or responds. After restarting zwavejs2mqtt all comes back to live. In Zwavejs2mqtt.log:

ERROR ZWAVE: Node x error while updating Neighbors: Failed to send the message after 3 attempts

or

ERROR ZWAVE: Error while writing true on 5-37-1-targetValue: Failed to send the message after 3 attempts

But the undelying zwavejs error always as above.

To Reproduce

Steps to reproduce the behavior: Start with yarn start and wait for several hours before this appears. Tested for 3-4 days and it is consequently dies every time. Approx since 4.2 version. (Not sure on what version this occured first).

Expected behavior

Stabile controller operation.

Additional context

Controller is AEOTEC ZWAVE USB STICK.

Attached logs

logs.zip

AlCalzone commented 3 years ago

The log seems clean up until the point where it suddenly dies. Did anything else change? USB Port or something?

cinadr commented 3 years ago

No nothing to mention or relevant.

cinadr commented 3 years ago

Same with version 4.5. logs.zip What to do to get more info on this? Any option to switch to trace? A simple restart solve the issue, no need to unplug the key or restart the host. Might Windows be a problem?

AlCalzone commented 3 years ago

I don't think there's anything to be logged when nothing happens. I guess I need to add a detection when the stick no longer responds and soft-reset it.

Might Windows be a problem?

I've never had any problems with Windows.

cinadr commented 3 years ago

One more thing to add here: I had an UPS behind my server but it is currently removed for battery exchange. Might be some voltage floating behind these errors as it appeared after I removed the UPS. But I have no other errors in the system (unifi switch and POE APs, cameras Zigbee Donge, etc. running from this UPS) Just thinking. Thank you.

AlCalzone commented 1 year ago

From: https://github.com/zwave-js/zwave-js-ui/issues/3146 Soft-reset isn't always possible. In some cases, the serialport needs to be reopened.

trankin commented 1 year ago

Is there a consistent way for me to be able to detect when the serial device is not responding. Currently when it fails it's pretty silent. I could automate restarting the service if I could find a way to detect when it gets into this state. I'm imagining sending some command that requires serial processing as a response to confirm that communication is working as expected?

AlCalzone commented 1 year ago

Not yet. I recently added support for tracking controller status in https://github.com/zwave-js/node-zwave-js/pull/6174. Updating that status when the controller is unresponsive is still TODO though.