zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
755 stars 613 forks source link

[question] Is it expected that device status does not persist between reboots? #2644

Closed michaelolson closed 3 years ago

michaelolson commented 3 years ago

I'm using zwavejs2mqtt through Docker. I have a problem that, when I bring the image down and back up, several devices, inconsistently, have to be manually refreshed. Before reboot, all devices have a date in the Last Active column and report appropriate capabilities.

After reboot, some devices, which includes mains connected devices, report Never as the Last Active time and lose their capabilities. For a particular device, before reboot and after a manual RefreshValues, the capabilities are:

Values
Multilevel Switch >
Central Scene >
Manufacturer Specific >
Version >

After reboot, the entire Values section disappears until manually refreshed, meaning that the device stops working entirely. This has resulted in me calling RefreshValues on most nodes after reboot in order to get back to a stable system.

I'm considering writing an integration that detects missing values and automatically calls RefreshValues, but I'm wondering if there's a way to avoid needing this.

robertsLando commented 3 years ago

Are you speaking avbout their status on mqtt? If so just enable retain flag

michaelolson commented 3 years ago

This is the status as represented in the control-panel page. Is the control panel pulling that data from MQTT?

The way the problem manifests is that, until RefreshValues is called, no changes are observed in Home Assistant. This seems to be because until that is called the device doesn't show any capabilities or sensors.

edit: Originally said no MQTT messages were sent, which I didn't check. I'm using this through HA for most pathways, which doesn't use MQTT.

robertsLando commented 3 years ago

If you are using z2m with zwavejs server and hass zwave integration I cannot help you, you should open an issue on zwave-js or hass repo :(

michaelolson commented 3 years ago

My question was specifically if it's expected that the device has no stored information on reboot, which isn't particular to the hass integration. It might only create problems for the hass integration, but I'm just trying to understand the behavior of the console. After reboot the devices in the console no longer show any status in the console until RefreshValues is called. This is easily reproducible on every reboot.

I had assumed this information would persist between reboots. The information is present before shutdown but it never seems to successfully retrieve the information from local storage.

robertsLando commented 3 years ago

I had assumed this information would persist between reboots

@michaelolson To be clear, if you are using MQTT this is how it is supposed to work (just remember to enable retain flag). If you are using anything else I cannot tell you if that's normal or not, I don't use hass :(

michaelolson commented 3 years ago

Thanks for the clarifications, I'll follow up with the hass team. I'm still not clear on how hass integration impacts what is displayed in the zwavejs2mqtt web console, and I have the retain flag set for my MQTT usage, but I suppose it's possible they clear values after zwavejs2mqtt starts up.

robertsLando commented 3 years ago

@michaelolson Could you send some logs please? I want to better check this

michaelolson commented 3 years ago

On shutdown I get:

2021-05-18 15:08:05.292 WARN APP: Shutdown detected: closing clients...
2021-05-18 15:08:05.309 INFO GATEWAY: Closing Gateway...
2021-05-18 15:08:05.401 INFO MQTT: MQTT client closed
2021-05-18 15:08:05.408 INFO MQTT: Client closed
2021-05-18 15:08:05.481 INFO ZWAVE-SERVER: Server closed
2021-05-18 15:08:05.485 INFO ZWAVE-SERVER: Client disconnected
2021-05-18 15:08:05.687 INFO ZWAVE: Client closed

On startup, I get the attached log.txt.

I can see it going through adding values for some nodes, but, for others, it never does. For example, nodes 28-32 are identical Z-Wave power monitoring devices that constantly emit values. After reboot, in the web console, in values they only show "Meter". If I click "Refresh Values" on node 32, then all of the Values show up: "Binary Switch", "Scene Activation", "Meter", "Configuration", "Manufacturer Specific", and "Version". Logs from the refresh look like:

2021-05-18 15:19:32.892 INFO ZWAVE: Calling api refreshValues with args: [ 32, [length]: 1 ]
2021-05-18 15:19:33.004 INFO ZWAVE: Node 32: value added 32-37-0-currentValue => false
2021-05-18 15:19:33.008 INFO ZWAVE: Node 32: value updated: 37-0-currentValue false => false
2021-05-18 15:19:33.043 INFO ZWAVE: Node 32 is alive
2021-05-18 15:19:33.211 INFO ZWAVE: Node 32: value added 32-37-0-currentValue => false
2021-05-18 15:19:33.214 INFO ZWAVE: Node 32: value added 32-37-0-targetValue => false
2021-05-18 15:19:33.216 INFO ZWAVE: Node 32: value added 32-43-0-sceneId => undefined
2021-05-18 15:19:33.219 INFO ZWAVE: Node 32: value added 32-43-0-dimmingDuration => undefined
2021-05-18 15:19:33.222 INFO ZWAVE: Node 32: value added 32-50-0-value-65537 => 492.984
2021-05-18 15:19:33.223 INFO ZWAVE: Node 32: value added 32-50-0-value-66049 => 0
2021-05-18 15:19:33.225 INFO ZWAVE: Node 32: value added 32-50-0-value-66561 => 125.714
2021-05-18 15:19:33.228 INFO ZWAVE: Node 32: value added 32-50-0-value-66817 => 0
2021-05-18 15:19:33.230 INFO ZWAVE: Node 32: value added 32-50-0-value-590337 => 134217.728
2021-05-18 15:19:33.233 INFO ZWAVE: Node 32: value added 32-50-0-reset => undefined
2021-05-18 15:19:33.235 INFO ZWAVE: Node 32: value added 32-112-0-20 => 1
2021-05-18 15:19:33.238 INFO ZWAVE: Node 32: value added 32-112-0-21 => 0
2021-05-18 15:19:33.241 INFO ZWAVE: Node 32: value added 32-112-0-24 => 1
2021-05-18 15:19:33.244 INFO ZWAVE: Node 32: value added 32-112-0-27 => 0
2021-05-18 15:19:33.246 INFO ZWAVE: Node 32: value added 32-112-0-151 => 50
2021-05-18 15:19:33.249 INFO ZWAVE: Node 32: value added 32-112-0-152 => 10
2021-05-18 15:19:33.252 INFO ZWAVE: Node 32: value added 32-112-0-171 => 30
2021-05-18 15:19:33.255 INFO ZWAVE: Node 32: value added 32-112-0-172 => 300
2021-05-18 15:19:33.258 INFO ZWAVE: Node 32: value added 32-112-0-173 => 0
2021-05-18 15:19:33.264 INFO ZWAVE: Node 32: value added 32-112-0-174 => 0
2021-05-18 15:19:33.266 INFO ZWAVE: Node 32: value added 32-114-0-manufacturerId => 634
2021-05-18 15:19:33.269 INFO ZWAVE: Node 32: value added 32-114-0-productType => 257
2021-05-18 15:19:33.271 INFO ZWAVE: Node 32: value added 32-114-0-productId => 13
2021-05-18 15:19:33.273 INFO ZWAVE: Node 32: value added 32-134-0-libraryType => 3
2021-05-18 15:19:33.276 INFO ZWAVE: Node 32: value added 32-134-0-protocolVersion => 4.38
2021-05-18 15:19:33.279 INFO ZWAVE: Node 32: value added 32-134-0-firmwareVersions => 1.2
2021-05-18 15:19:33.281 INFO ZWAVE: Node 32: value added 32-134-0-hardwareVersion => undefined
2021-05-18 15:19:33.286 INFO ZWAVE: Node 32 ready: Zooz - ZEN15 (Power Switch)
2021-05-18 15:19:33.353 INFO ZWAVE: Node 32: value updated: 50-0-value-65537 492.984 => 492.984
2021-05-18 15:19:33.423 INFO ZWAVE: Node 32: value updated: 50-0-value-66049 0 => 0
2021-05-18 15:19:33.487 INFO ZWAVE: Node 32: value updated: 50-0-value-66561 125.714 => 126.057
2021-05-18 15:19:33.547 INFO ZWAVE: Node 32: value updated: 50-0-value-66817 0 => 0
2021-05-18 15:19:33.559 INFO ZWAVE: Success zwave api call refreshValues { success: true, message: 'Success zwave api call', result: undefined }

The "value added" instead of "value updated" for most of these led me to believe the values aren't persisted between reboots.

If I don't manually refresh the nodes in this state, they are not usable through Hass (understood that's not something you can help with, I haven't tried poking them through MQTT). In the web console, refreshing the converts a node row from line 1 to line 2:

31  Zooz    Power Switch    ZEN15   NetworkUPS      No  Yes No  Unknown 
Complete
5/18/2021, 8:09:46 AM
32  Zooz    Power Switch    ZEN15   EVCharger       No  Yes No  Alive   
Complete
5/18/2021, 8:19:33 AM

I usually use the "Unknown" in the "Status" column to know I need to call "Refresh Values" on a node. If I do this once for ~every node after reboot, then things work fine after that.

michaelolson commented 3 years ago

FWIW, this seems to most commonly impact my power monitoring devices, which constantly emit values. I've wondered if there's a race condition where if a value update arrives from the device before values are retrieved from storage then values are never retrieved from storage. Pure speculation, though.

robertsLando commented 3 years ago

Humm this could be due to some cache corruption problems, @AlCalzone any clue?

@michaelolson Could you also attach zjs logs please?

AlCalzone commented 3 years ago

https://github.com/zwave-js/node-zwave-js/issues/2493 maybe. Since this seems reproducible, maybe you could provide the files in I asked for in that issue @michaelolson

michaelolson commented 3 years ago

Followed the troubleshooting guide to turn log levels up to "Silly" and write log files. I verified that the files were being emitted, shut down, captured the logs, captured the cache, and started back up. After starting back up, I verified that multiple nodes were in the Unknown state and refreshed one. I then captured the startup logs. All files attached:

AlCalzone commented 3 years ago

@michaelolson Judging from your files, it doesn't look like the cache is actually corrupted.

It seems more like the stick is having some troubles, causing the serial communication to mess up, which likely causes the unknown status in the UI. Your log is full of these:

2021-05-19T17:13:19.137Z CNTRLR   No response from controller after 1/3 attempts. Scheduling next try in 100 ms.
...
2021-05-19T17:13:20.252Z CNTRLR   Failed to execute controller command after 2/3 attempts. Scheduling next try i
                                  n 1100 ms.
...
2021-05-19T17:13:22.398Z CNTRLR   [Node 003] ping failed: Timeout while waiting for an ACK from the controller
michaelolson commented 3 years ago

Hrmm, interesting. I'm not sure what that means for my controller. Have you seen behavior like that before? Would the only fix be to replace the controller?

A point I'm confused on is if the poor behavior is timeouts contacting the controller, wouldn't the data in the console still fall back to what's in the cache if it can't get a response from the controller? What causes the data to end up empty instead of using the valid values in the cache?

AlCalzone commented 3 years ago

We're pretty sure now that this has to do with zwavejs2mqtt querying all neighbors immediately when the driver starts. This seems to cause older sticks to trip up.

Normally during the startup, listening nodes are pinged to figure out if they are alive or dead, then they are marked "ready". zwavejs2mqtt uses this event to display the information we already know, but that doesn't happen if the stick gets stuck before that.

AlCalzone commented 3 years ago

This should be fixed by https://github.com/zwave-js/zwavejs2mqtt/pull/1279 btw, which was merged a couple of hours ago

michaelolson commented 3 years ago

Ahh, got it. So some kind of throttling of requests for older sticks would let the process succeed and having a fallback to display the information even if the ping fails guarantees it doesn't end up empty. Thanks for linking in the fix.

AlCalzone commented 3 years ago

Its not throttling the requests per se, its mixing requests that read from the external NVM and requests that use the radio and potentially write to the NVM. I can't reproduce that on my Z-Wave 700 stick no matter how hard I try, but older devices really don't like that.

michaelolson commented 3 years ago

Thanks for the clarification, I appreciate the detail. That also seems to constitute a recommendation for the Z-Wave 700 if I ever need a replacement.