zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
742 stars 593 forks source link

Issues with Kwikset 914 (nodes going dead, jamming up entire #4483

Closed jdiegmueller closed 2 years ago

jdiegmueller commented 2 years ago

Is your problem within Home Assistant (Core or Z-Wave JS Integration)?

NO, my problem is NOT within Home Assistant or the ZWave JS integration

Is your problem within ZWaveJS2MQTT?

NO, my problem is NOT within ZWaveJS2MQTT

Checklist

Describe the bug

Hello!

I have three Kwikset 914 deadbolts that I am having trouble with. I have a fairly active 131-node Zwave network currently powered by ZwaveJS 9.0.2 & ZwaveJS2MQTT 6.7.1 on a Raspberry Pi 4B, installed by npm. I am using an SiLabs UZB-7 (with 7.17.2 firmware) as the controller. I currently have HomeAssistant 2022.4.5 talking to this setup.

What I am observing is that sending lock/unlock commands to the three deadbolts results in a variety of intermittent behaviors -- none of these necessarily happen "every time", but all happen very frequently:

This probably goes without saying, but the behavior tends to compound when trying to interact with multiple locks in a row -- for example, issuing a lock command to all three deadbolts at bedtime will typically jam up other Zwave commands for quite some time. I've made this behavior a little less painful by introducing a short delay between each lock command (3 seconds in the log I'm going to attach).

PS: Last thought I thought might be worth sharing: Even though the three deadbolts are the same model, ZwaveJS detects one of them as something different (although all three report the same v4.10 firmware). I have observed these same results even after re-interviews, and exclusion/re-inclusion cycles:

PPS: I do have the capability to capture Zniffer logs if needed & requested.

Device information

Manufacturer: Kwikset Model name: 914 Node ID in your network: 12, 13, 15

How are you using node-zwave-js?

Which branches or versions?

version: node-zwave-js 9.0.2, zwavejs2mqtt 6.7.1

Did you change anything?

yes (please describe)

If yes, what did you change?

For the past few months I considered this setup to be a sandbox as I got to know HomeAssistant and ZwaveJS a bit better, but I got comfortable enough with it that I migrated everything from another Zwave controller/platform in the past few weeks (growing me to the 131 nodes I have at the moment). I have anecdotally observed this problem has gotten more severe as I moved more and more Zwave devices over to this setup.

Did this work before?

Yes (please describe)

If yes, where did it work?

See previous comment about recent changes.

Attach Driver Logfile

zwavejs_2022-04-18.log.gz

A great example of this behavior in action would be around 2022-04-19T03:16:09.817Z.

Note that the ping sent to Node 013 at 2022-04-18T23:28:54.155Z is automated; when one of the three deadbolt nodes go "dead" I have my setup wait 15 seconds and then send it a ping to bring it back to life.

zwave-js-bot commented 2 years ago

👋 Hey @jdiegmueller!

It looks like you attached a logfile, but its filename doesn't look like it a driver log that came from Z-Wave JS. Please make sure you upload the correct one.

AlCalzone commented 2 years ago

I'm almost certain this isn't caused by the locks, but by excessive reporting which causes your controller to stop sending anything. Essentially https://github.com/zwave-js/node-zwave-js/issues/3906 but much less severe.

Z-Wave JS doesn't handle a jammed controller well yet, but you should be able to get on top of this by turning down on the meter and multilevel sensor reporting. In your log I see >7000 electric meter reports, >14000 lux sensor reports, and almost 5000 motion sensor reports, many of which are repeated.

This sometimes happens multiple times per second, which is definitely too much for a stable network. I suggest the following: Check each meter and multilevel sensor report setting, if you actually need it. E.g. do you automate on the illuminance? If not, turn it off.

Turn the reporting interval waaaaaaay up and use reporting on (significant) change where possible. E.g. a bunch of these measurements are the multisensor reporting that the lux level is still 0. Or half of Node 097's reports are "I am still using 1.6W".

Decide where you need which granularity and precision. Maybe the kWh measurement changing from 1.696 -> 1.697 -> 1.698 -> ... -> 1.705 could be lowered to reporting every 0.01 kWh, or even 0.1 kWh. These might not be many reports per device, but over a large network this adds up quickly. Especially if you have bursts where you move through the house, 5 motion sensors fire, turn on lights and 15 power meters start reporting every few seconds.

If you need periodic reports as a backup, I suggest not going below once per hour per device.

jdiegmueller commented 2 years ago

That makes sense given it has gotten worse as I've moved the various sensors over to this platform. I'll work on cranking down all of the reporting.

I also have quite a few Zsticks, maybe I'll move the locks and three strategically placed repeaters over to a seconds ZwaveJS2MQTT instance.

Thanks.

jdiegmueller commented 2 years ago

Oh, any thoughts about that third lock showing up as the wrong device/model? Would you be interested in the debug results of a re-interview of it?

AlCalzone commented 2 years ago

Its very likely identifying itself using different IDs which are used by resellers. At least the Google results look similar.

jdiegmueller commented 2 years ago

Thanks again for looking at this with me. Here's what I've done since yesterday:

I went through the log I originally attached, and indeed there was roughly a metric ton more sensor reports than I expected -- specifically, I found 14111 Illuminance reports, nearly 100% of these from my 32 Aeotec Trisensors. This blew my mind since I was using 0 lux for my Aeotec Trisensors's "Light Intensity Change Threshold" ([41-112-0-22]) which effectively disabled threshold reporting for Illuminance. I expected the Trisensor to then only report Illuminance based on the timer set in "Automatic Reporting Interval: Light Sensor ([41-112-0-24]) which I had at 14400 seconds. In practice this appears to not be the case, as log review shows that the Trisensors were reporting Illuminance every 3 minutes regardless (even when the value stayed the same). Across 32 Aeotec Trisensors, that worked out to an average of an Illuminance report every 5 seconds; yikes!

So with that in mind, I cranked the Light Intensity Change Threshold to 10000 lux and and Automatic Reporting Internal: Light Sensor to 14400. I don't really use Air temperature either, so I cranked Temperature Change Threshold to 50 and Timed Temperature Report Interval to 14400. I also disabled threshold power and energy threshold reports from the 50 Inovelli Red Series switches/dimmers and increased the power and energy reporting interval to 14400.

That has quieted down the mesh significantly; I now sometimes go a few minutes without seeing a single report from a sensor.

But back to the lock issue, I'm still seeing the original symptoms: huge delays in response to lock/unlock, sometimes the locks never respond to lock/unlock, sometimes when the lock does finally act it doesn't report back and takes a Refresh Values to get the truth.

I find that if I am very slow and deliberate -- ie, if I wait for a lock/unlock command to complete & report back before doing moving on to the next lock -- it works. But if I overlap the lock/unlock commands at all (or issue other Zwave commands before the lock has completed + reported back) it almost always runs in to the trouble symptoms.

I'm attaching a new log from this morning with the quieter mesh only covering 1 hour. The locks are nodes 12, 13, and 15. (Side note: I also see two Trisensors still reporting Illuminance every 3 minutes in here, I'll go wake them up now). Does this new log still indicate the 700 Series ATF issues?

If you need me to repeat with more specific timestamps of when I issued commands vs. what happens + when, or a Zniffer capture to go along with a ZwaveJS log, I'm happy to do so. I appreciate you trying to help me sort this out.

TRIMMED_zwavejs_2022-04-21.log

AlCalzone commented 2 years ago

The controller is still jammed once for roughly a second or two while the onslaught of commands from the network is coming. I think there is still another underlying issue, but this is roughly what happens:

They don't support Security S2, do they? This would also help a ton.


What worries me a bit are the route speeds used. Out of 213 transmission reports,

The fallback speed seems to be used for every communication with the locks, which could be caused by either:

zwavejs2mqtt has a tool (similar to the one in PC Controller) to diagnose the link between a node and the controller and between two nodes. After you've made sure of a good controller placement and healed, I'd use that to figure out where the link goes bad:

Make sure to initiate the check on the locks towards the other nodes, as FLiRS devices as check targets would introduce incorrect delays.

jdiegmueller commented 2 years ago

@AlCalzone, will review your advice about device routing in the next day or two. Thank you. I do use a USB extension cable to mount the Zstick in a position that it has direct line of sight to 2 of the locks, so I'm surprised all of the strange routing is even necessary.

I did move everything to an Aeotec Zstick Gen5+ and there was definitely improvement: the lock/unlock commands work reliabily now. The feedback from the lock that it is locked/unlocked is occasionally still spotty (complaints about timeouts and expired nonces, similar to https://github.com/zwave-js/node-zwave-js/issues/4508) but I'll focus on the bits you suggested before reporting back with more detail.

To your S0 vs S2 question: Is there an S2-supporting deadbolt you know is reliable? I'm not married to these Kwikset 914s. I bought them because they were Zwave Plus assuming that meant S2, but of course that turned out to be an incorrect assumption.

jdiegmueller commented 2 years ago

To your S0 vs S2 question: Is there an S2-supporting deadbolt you know is reliable? I'm not married to these Kwikset 914s. I bought them because they were Zwave Plus assuming that meant S2, but of course that turned out to be an incorrect assumption.

Looks like Kwikset has a newer model called the Home Connect 620 that does include S2:

I am going to order one of these today, and if they work well I'll replace the 914s with these guys.

Botched1 commented 2 years ago

I have 3x HomeConnect 620 zwave700 models working just fine. Make sure you get one of the new 700 series ones, if you can - can be tricky sometimes as the model number is the same - HomeConnect 620.

zwave-js-assistant[bot] commented 2 years ago

This issue has not seen any recent activity and was marked as "stale 💤". Closing for housekeeping purposes... 🧹

Feel free to reopen if the issue persists.