home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.04k stars 29.7k forks source link

ZwaveJS Renames Devices on HA Restart #80398

Closed dcmeglio closed 9 months ago

dcmeglio commented 1 year ago

The problem

When I restart HA, I have multiple devices that rename themselves back to default names and remove the Area. For example my "Master Bedroom Ceiling Lights" renames to "700 Series Toggle Dimmer"

What version of Home Assistant Core has the issue?

2022.10.4

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zwave_js

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zwave_js/

Diagnostics information

zwave_js-dfc70561bf763591e895062eb2bfda66-700 Toggle Dimmer-94311538a60068cd86453504b2c1913b.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Nothing seems interesting in the zwave js ui logs. Here is all references to the node since restarting HA:
C:\Users\dmegl\Downloads\z-ui_2022-10-15 (1).log
  1866,38: 2022-10-15 18:17:57.187 INFO Z-WAVE: Node 120: value updated: 114-0-manufacturerId 634 => 634
  1867,38: 2022-10-15 18:17:57.190 INFO Z-WAVE: Node 120: value updated: 114-0-productType 28672 => 28672
  1868,38: 2022-10-15 18:17:57.193 INFO Z-WAVE: Node 120: value updated: 114-0-productId 40964 => 40964
  1876,38: 2022-10-15 18:17:57.622 INFO Z-WAVE: Node 120: value updated: 134-0-libraryType 3 => 3
  1877,38: 2022-10-15 18:17:57.625 INFO Z-WAVE: Node 120: value updated: 134-0-protocolVersion 7.13 => 7.13
  1878,38: 2022-10-15 18:17:57.629 INFO Z-WAVE: Node 120: value updated: 134-0-firmwareVersions 10.0 => 10.0
  1879,38: 2022-10-15 18:17:57.632 INFO Z-WAVE: Node 120: value updated: 134-0-hardwareVersion 1 => 1
  2309,38: 2022-10-15 18:18:44.782 INFO Z-WAVE: Node 120: value updated: 114-0-manufacturerId 634 => 634
  2310,38: 2022-10-15 18:18:44.784 INFO Z-WAVE: Node 120: value updated: 114-0-productType 28672 => 28672
  2311,38: 2022-10-15 18:18:44.787 INFO Z-WAVE: Node 120: value updated: 114-0-productId 40964 => 40964
  2320,38: 2022-10-15 18:18:45.449 INFO Z-WAVE: Node 120: value updated: 134-0-libraryType 3 => 3
  2321,38: 2022-10-15 18:18:45.453 INFO Z-WAVE: Node 120: value updated: 134-0-protocolVersion 7.13 => 7.13
  2322,38: 2022-10-15 18:18:45.457 INFO Z-WAVE: Node 120: value updated: 134-0-firmwareVersions 10.0 => 10.0
  2323,38: 2022-10-15 18:18:45.460 INFO Z-WAVE: Node 120: value updated: 134-0-hardwareVersion 1 => 1

Additional information

Running zwave js ui

home-assistant[bot] commented 1 year ago

Hey there @home-assistant/z-wave, mind taking a look at this issue as it has been labeled with an integration (zwave_js) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)


zwave_js documentation zwave_js source (message by IssueLinks)

dcmeglio commented 1 year ago

Ok it has something to do with Zwave JS UI overwriting values in HA. I went through and set names for every device in Zwave JS UI via the nodes.json and now only the area gets wiped, not the name. So for certain, just a few nodes, Zwave JS UI is overwriting whatever is set in HA with the default name.

Chef-de-IT commented 1 year ago

I have the same issue as well. Opened related:

https://github.com/home-assistant/core/issues/80447#issue-1410605814

dcmeglio commented 1 year ago

That does sound very similar. Are you seeing something in the logs that suggests a device is going out of range/reconnecting? I didn't spot anything. For reference I also have a fairly large zwave mesh, 109 devices. So that's another similarity

Chef-de-IT commented 1 year ago

@dcmeglio I just assumed, based on the distances involved plus the fact that bluetooth and wifi range varies substantially over time in the interference-dense area

dcmeglio commented 1 year ago

@dcmeglio I just assumed, based on the distances involved plus the fact that bluetooth and wifi range varies substantially over time in the interference-dense area

Got it, it does seem reasonable and it's my suspicion too, I was just hoping you had some way to prove it.

Chef-de-IT commented 1 year ago

Understood. I was just hoping that HomeAssistant would be smart enough to keep the device's hardware addr, name, and area in a database and if it went out of range and back in, find it in that there database - not be like "peekaboo, this right there is a brand new device now"

dcmeglio commented 1 year ago

Totally agree. I use Zwave JS UI. If I set the names in the addon it seems to remember those names and apply them in HA. That causes my entity names to at least not get screwed up when it does this.

Chef-de-IT commented 1 year ago

Totally agree. I use Zwave JS UI. If I set the names in the addon it seems to remember those names and apply them in HA. That causes my entity names to at least not get screwed up when it does this.

picard_annoyed

How are those names not symmetrically synced in both directions?

arigit commented 1 year ago

+1 seeing this issue. Currently on 2022.10.4 but I noticed it on 2022.9.x as well

deadsquid commented 1 year ago

+1 seeing this in 2022.10.2.

Chef-de-IT commented 1 year ago

Since I've started using Helpers to be able to control some of the 23 Z-Wave lights as groups in unison, I've also noticed that when a Z-Wave device loses its name, it also gets its entities renamed AND it sometimes gets booted from its groups. If groups include groups (e.g. a "Dimmers A" (side lighting) group includes FloorLamps and WallSconces groups along with some individual members) this makes managing scenes and controlling lights very clean and convenient. When it works.

It's pretty unfortunate a device can just up and lose its identity and get booted from the group. In one case it actually didn't get booted from its group but had its generic name in there, but when I edited its name in UI, the old entities hung around. This feels discouraging in terms of counting on HA for any 3rd party clients, but I'm hopeful the developers will see this & look into it. HA is a fantastic tool - I just wish it weren't moving quite so fast forward at the cost of such bugs. The entire Z-Wave mesh network control experience is unreliable and I wish HA were coded more defensively to deal with that inherent issue of many meshes.

Screenshot 2022-11-06 at 17-07-40 Group Settings – Home Assistant

Screenshot 2022-11-06 Group Members – Home Assistant

Chef-de-IT commented 1 year ago

Here's an example of this having happened again yesterday. There are 4 identical floor lamps in the 4 corners of the restaurant dining room. They're grouped together to manage their brightness in unison, by means of a group (UI Settings - Devices - Helpers - Group - Light group).

I noticed that one of the lamps wasn't responding to the group's brightness changes:

Chef de IT Z-Wave Zooz client install pic1

Lo and behold, perhaps to a momentary RF interference in the area or another reason for that lamp's Z-Wave mesh node to disappear and reappear, Z-Wave added a misnamed entity instead of the MDR.Dim.Floorlamp.4 - that had no real device.

Screenshot_20221108-Z-Wave Device Rename Impacts Groups

Note that at that point the correct entity for the 4th lamp, MDR.Dim.Floorlamp.4 was also available on the Entity pull-down menu, and when swapped in, the entire group started working as it should. Until the next glitch, I surmise.

deadsquid commented 1 year ago

Update for me, I completely pooched my JS device configs, and ended up restoring from a full backup and haven't experienced an issue since.

+1 seeing this in 2022.10.2.

snarlingllama commented 1 year ago

I am having the same issue except I do not use the ZWAVE JS UI add-on; only the "regular" ZWAVE JS add-on (Current version: 0.1.74).

For the ZWAVE JS integration, I have: Driver Version: 10.3.0 Server Version: 1.24.0

Also: Home Assistant 2022.11.4 Supervisor 2022.10.2 Operating System 9.3 Frontend 20221108.0 - latest

I have 100 ZWAVE devices, so this is extremely distressing to experience. Basically, every time I restart HA, it randomly renames entities, adds the wrong entities to devices and marks them unavailable (example, it added kwh to a Ecolink door sensor). It also resets all naming conventions for multisensors, smart switches, and door sensors without any consistency. Its rarely ever the same device that gets impacted.

Can anyone confirm if the solution posted by @deadsquid worked for them? Does "pooched my JS device configs" mean uninstalling the integration or something else?

Would greatly appreciate help with this issue as its the only unstable aspect of HA for me.

Chef-de-IT commented 1 year ago

Can anyone confirm if the solution posted by @deadsquid worked for them? Does "pooched my JS device configs" mean uninstalling the integration or something else?

@snarlingllama , unfortunately it didn't work for me. I've only a quarter of your Z-wave devices on the larger of my pilot installs (just 24) but already had an uncomfortably tense "conversation" with a client. We both prefer to keep decorum & not raise one's voice - but having added an RGB light ("Zooz Z-Wave Plus S2 12/24 V DC RGBW Dimmer ZEN31 for LED Strips and DC Lighting, Work as a Network Repeater") it had a ~1 sec delay responding to customer input for color change, and customer touching the HA color wheel control may have resulted in multiple "inputs" - which quickly queued up & for good 30 sec the system had a mind of its own with light changes & frantic customer "corrective input" only being added to the back of the queue. The ordeal "resolved itself" when the phone was thrown across the room and the queue eventually cleared.

ALL of my Z-Wave devices are 700 series chip, Z-Wave Plus S2. Latest firmware, network healed. Needless to say this makes it hard to be comfortable with this tech for anything beyond a handful of switches in a couple of adjacent rooms. Which is fine if all this is capable of - but I was going off a premise of 100+ device capability and 1-mile line of sight range (so say 1/4mile would be non-line-of-sight). Ugh. Sorry for the vent but it's context for how milliseconds & signal repeats add up to such real-life situations.

Screenshot_20221116_165848_Firefox Screenshot_20221116_170156_Firefox

snarlingllama commented 1 year ago

@Chef-de-IT Thank you for the response. Looking at your screenshots, I am having the exact same issues.

What is really strange, one of my Ecolink door sensors (which is installed about 6 feet from my Aeotec Z-Wave stick) suddenly changed its entity ID without any reboots, etc. Just out of the blue it changed the entity ID and was working fine moment before this happened (we have an automation that plays an audio file and sends a push notification to our phones then this door is opened).

Does anyone have any ideas, etc. of what could be happening here? I would imagine more than just 5+ people are experiencing this. There has to be some commonality here. Back in the 2022.8 days, none of this ever happened with my Z-Wave devices. It appears to be related to the recent releases.

What am I missing here?

deadsquid commented 1 year ago

Follow-up: I was overly optimistic and continue to see the metadata issue on add-on restart. It has happened with multiple devices from Jasco (wall outlet), Innovelli (switch, dimmer, and fan), and Zooz (scene controller), and the chronology of when the devices were added don't seem to matter. Metadata loss (the device remains in the controller) can occur on zwave restart, stop/start, or a HA reboot (all proper shutdowns). I'm happy to instrument the logs as needed, but haven't seen anything that stands out as to when a problem occurs. It's amazingly frustrating, and continues with the latest release.

@snarlingllama - pooched means I tried uninstalling and re-installing ZwaveJS, migrating over to ZwaveJS UI, and a couple other things, and made things considerably worse in the process so simply restored a known-good full backup (which is now a default process on any change or right before I want to restart things for any reason).

Current setup is HA OS an a Generic x86-64 (AcePC AK1 - Celeron), and I use a ZooZ S2 ZST 10 700 with the latest firmware (7.17.2)for the controller

Home Assistant 2022.11.4 Supervisor 2022.10.2 Operating System 9.3 Frontend 20221108.0 - latest

Happy to provide more info, and it's a pretty straight-forward setup. Diagnostic info from the integration attached.

config_entry-zwave_js-fd9de23798136f36e5543a532a561534.json.zip

Chef-de-IT commented 1 year ago

Does anyone have any ideas, etc. of what could be happening here?

@snarlingllama if I had to guess, it could be a naively coded new device discovery mechanism. A Z-Wave mesh node disappears and then reappears, mesh says, here's a new node, its hardware address isn't first checked against the database of already configured device metadata keyed off their hardware addresses but an "add new device" routine is called, that overwrites any "already taken hardware address" with "new device blank defaults". I doubt it's exactly that in terms of the actual moving pieces, as I didn't look at the code, but it's perhaps something amounting to this.

I would imagine more than just 5+ people are experiencing this.

It's possible for developers to conflate 5 people (only) commenting with just 5 people having an issue, although for every one of us there are perhaps 200 who don't know what github is, 200 who know but can't be bothered, 200 who have the issue and not realize it,100 who just give up and 10,000 who have all their two z-wave switches (in plastic electrical boxes not metal) and a fish feeder running perfectly fine all situated within 36inches of the HA computer in their basement in rural Iowa, 76 lightyears away from the nearest source of radio interference.

blhoward2 commented 1 year ago

@snarlingllama if I had to guess, it could be a naively coded new device discovery mechanism. A Z-Wave mesh node disappears and then reappears, mesh says, here's a new node, its hardware address isn't first checked against the database of already configured device metadata keyed off their hardware addresses but an "add new device" routine is called, that overwrites any "already taken hardware address" with "new device blank defaults". I doubt it's exactly that in terms of the actual moving pieces, as I didn't look at the code, but it's perhaps something amounting to this.

This isn’t how device discovery works. The same device should be detected every time, even if you nuke the cache. That’s how switching between the addons works without causing a rename.

blhoward2 commented 1 year ago

Ok it has something to do with Zwave JS UI overwriting values in HA. I went through and set names for every device in Zwave JS UI via the nodes.json and now only the area gets wiped, not the name. So for certain, just a few nodes, Zwave JS UI is overwriting whatever is set in HA with the default name.

To be clear, this isn’t possible. The zwavejs integration may have a bug causing this but Z-Wave JS UI has no ability to change anything in HA. It just pipes events through the websocket where they’re consumed on the HA side.

blhoward2 commented 1 year ago

All: I recognize that this is an intermittent issue but no one can begin to solve it without logs. Turn on log to file in zwavejs-ui, at the debug level or higher, and be vigilant for the change. (Note that logging is set in two places. You want under Z-Wave.) It’s going to be difficult to pin down where exactly in the log this is happening but that will have to happen. Node exports from before and after will also be crucial. Describing the symptoms over and over again doesn’t help diagnose the issue.

Chef-de-IT commented 1 year ago

All: I recognize that this is an intermittent issue but no one can begin to solve it without logs. Turn on log to file in zwavejs-ui, at the debug level or higher, and be vigilant for the change. (Note that logging is set in two places. You want under Z-Wave.) It’s going to be difficult to pin down where exactly in the log this is happening but that will have to happen. Node exports from before and after will also be crucial. Describing the symptoms over and over again doesn’t help diagnose the issue.

I'm happy to do my part - please provide HomeAssistant screenshots on which logs you want enabled and what settings in HA UI I should pick. The issue is happening every other day, so there should be no problem getting the logs for it. I've a HomeAssistant with the Z-Wave JS (not JS-UI) integration. Please provide the precise steps for the logs you would like.

blhoward2 commented 1 year ago

You cannot log to file in the vanilla addon (to my knowledge) so you’re unable to help here. I don’t run addons so I could be wrong.

deadsquid commented 1 year ago

I’ll see if I can pull logs.

Chef-de-IT commented 1 year ago

In HA UI > Settings > Add-ons > Z-Wave JS > Settings, I have the log level set to Debug. In In HA UI > Settings > Add-ons > Z-Wave JS > Log, I have the following: Screenshot Home Assistant Z-Wave Logs

There's a fair bit of:

dcmeglio commented 1 year ago

I posted logs with the original report and also confirmed that if you set the name and area in the nodes.json it works around the issue. Nothing in the logs seems to provide any insight to me and no one followed up with any questions on the logs I provided.

Chef-de-IT commented 1 year ago

2022-11-24T23:47:15.230Z SERIAL « 0x012700a814013....more-hex-string...d8dbe (41 bytes) 6ff645b4d1400d37e 2022-11-24T23:47:15.231Z SERIAL » [ACK] (0x06) 2022-11-24T23:47:15.233Z CNTRLR [Node 053] [~] [Meter] value[65537]: 0.18 => 0.18 [Endpoint 1] 2022-11-24T23:47:15.233Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ type: broadcast │ target node: 255 │ RSSI: -45 dBm └─[Security2CCMessageEncapsulation] │ sequence number: 34 └─[MultiChannelCCCommandEncapsulation] │ source: 1 │ destination: 0 └─[MeterCCReport] type: Electric scale: kWh rate type: Consumed value: 0.18 time delta: 0 seconds 2022-11-24T23:47:16.098Z SERIAL « 0x012700a814....more-hex-string...a1d8dbe (41 bytes) 6ff645b4d1400d37e 2022-11-24T23:47:16.099Z DRIVER Dropping message with invalid payload 2022-11-24T23:47:16.099Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ type: broadcast │ target node: 255 │ RSSI: -45 dBm └─[Security2CCMessageEncapsulation] [INVALID] error: Duplicate command 2022-11-24T23:47:16.100Z SERIAL » [ACK] (0x06) Starting logging event forwarder at debug level Stopping logging event forwarder 2022-11-25T00:27:46.788Z SERIAL « 0x012300a80....more-hex-string...499383d56505 (37 bytes) f9700a71e 2022-11-25T00:27:46.790Z SERIAL » [ACK] (0x06) 2022-11-25T00:27:46.792Z CNTRLR [Node 053] [~] [Meter] value[66049]: 2.8 => 2.6 [Endpoint 1] 2022-11-25T00:27:46.793Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ RSSI: -89 dBm └─[Security2CCMessageEncapsulation] │ sequence number: 35 └─[MultiChannelCCCommandEncapsulation] │ source: 1 │ destination: 0 └─[MeterCCReport] type: Electric scale: W rate type: Consumed value: 2.6 time delta: 0 seconds 2022-11-25T00:27:46.882Z SERIAL « 0x012300a800013....more-hex-string...8499383d56505 (37 bytes) f9700a41d 2022-11-25T00:27:46.883Z DRIVER Dropping message with invalid payload 2022-11-25T00:27:46.884Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ RSSI: -92 dBm └─[Security2CCMessageEncapsulation] [INVALID] error: Duplicate command 2022-11-25T00:27:46.884Z SERIAL » [ACK] (0x06) 2022-11-25T00:27:47.731Z SERIAL « 0x012300a800....more-hex-string...99383d56505 (37 bytes) f9700a61f 2022-11-25T00:27:47.732Z DRIVER Dropping message with invalid payload 2022-11-25T00:27:47.732Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ RSSI: -90 dBm └─[Security2CCMessageEncapsulation] [INVALID] error: Duplicate command 2022-11-25T00:27:47.733Z SERIAL » [ACK] (0x06) 2022-11-25T00:27:47.821Z SERIAL « 0x012300a8000....more-hex-string...9383d56505 (37 bytes) f9700a61f 2022-11-25T00:27:47.822Z DRIVER Dropping message with invalid payload 2022-11-25T00:27:47.822Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ RSSI: -90 dBm └─[Security2CCMessageEncapsulation] [INVALID] error: Duplicate command 2022-11-25T00:27:47.822Z SERIAL » [ACK] (0x06) 2022-11-25T00:27:48.048Z SERIAL « 0x012300a814....more-hex-string...499383d56505 (37 bytes) f9700d37e 2022-11-25T00:27:48.049Z DRIVER Dropping message with invalid payload 2022-11-25T00:27:48.050Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ type: broadcast │ target node: 255 │ RSSI: -45 dBm └─[Security2CCMessageEncapsulation] [INVALID] error: Duplicate command 2022-11-25T00:27:48.050Z SERIAL » [ACK] (0x06) 2022-11-25T00:27:48.598Z SERIAL « 0x012300a8140....more-hex-string...83d56505 (37 bytes) f9700e24f 2022-11-25T00:27:48.599Z DRIVER Dropping message with invalid payload 2022-11-25T00:27:48.599Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand] │ type: broadcast │ target node: 255 │ RSSI: -30 dBm └─[Security2CCMessageEncapsulation] [INVALID] error: Duplicate command 2022-11-25T00:27:48.600Z SERIAL » [ACK] (0x06)

blhoward2 commented 1 year ago

In HA UI > Settings > Add-ons > Z-Wave JS > Settings, I have the log level set to Debug. In In HA UI > Settings > Add-ons > Z-Wave JS > Log, I have the following: Screenshot Home Assistant Z-Wave Logs

There's a fair bit of:

  • Dropping message with invalid payload
  • error: Duplicate command
  • [Security2CCMessageEncapsulation] [INVALID]

This is unrelated. It suggests a poor mesh or possibly a misbehaving device.

blhoward2 commented 1 year ago

I posted logs with the original report and also confirmed that if you set the name and area in the nodes.json it works around the issue. Nothing in the logs seems to provide any insight to me and no one followed up with any questions on the logs I provided.

There are no logs of any help in your report. Not at the info level as most things aren’t logged and it’s not clear to me that is the right point in time. We need to see what event in zwavejs is triggering this to happen so it can be traced through.

I’m following up now…

blhoward2 commented 1 year ago

@Chef-de-IT Is your stick on a usb extension cable? What stick? What fw is the stick on? Have you tried a network heal?

dcmeglio commented 1 year ago

We need to see what event in zwavejs is triggering this to happen so it can be traced through.

Well I’ll leave it to others. In the last month I’ve already worked around this issue and don’t really want to break things again to track it down.

snarlingllama commented 1 year ago

I will likely need to increase my logging level, but wanted to at least post this. I am using the regular Z-Wave JS add-on.

code: 260, context: undefined, transactionSource: undefined } 2022-11-25T00:44:12.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:44:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:45:03.308Z CNTRLR [Node 024] treating BasicCC::Set as a report 2022-11-25T00:45:12.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:45:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:46:12.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:46:42.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:47:04.180Z CNTRLR [Node 024] treating BasicCC::Set as a report 2022-11-25T00:47:12.677Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:47:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:47:42.777Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:47:42.879Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:47:42.977Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:47:45.757Z CNTRLR Failed to execute controller command after 1/3 attempts. Scheduling next try i n 100 ms. 2022-11-25T00:48:01.717Z CNTRLR « [Node 120] received wakeup notification 2022-11-25T00:48:01.722Z CNTRLR [Node 120] The node is now awake. 2022-11-25T00:48:04.314Z CNTRLR » [Node 120] Sending node back to sleep... 2022-11-25T00:48:04.355Z CNTRLR [Node 120] The node is now asleep. 2022-11-25T00:48:12.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:48:42.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:49:05.216Z CNTRLR « [Node 119] received wakeup notification 2022-11-25T00:49:05.218Z CNTRLR [Node 119] The node is now awake. 2022-11-25T00:49:06.387Z CNTRLR » [Node 119] Sending node back to sleep... 2022-11-25T00:49:06.422Z CNTRLR [Node 119] The node is now asleep. 2022-11-25T00:49:12.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:49:42.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:50:03.116Z CNTRLR [Node 018] treating BasicCC::Set as a report 2022-11-25T00:50:12.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:50:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:51:12.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:51:32.881Z CNTRLR [Node 024] treating BasicCC::Set as a report 2022-11-25T00:51:39.482Z CNTRLR [Node 022] treating BasicCC::Set as a report 2022-11-25T00:51:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:52:12.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:52:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:52:42.777Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:52:42.878Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:52:42.977Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:53:12.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:53:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:54:12.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:54:41.787Z CNTRLR [Node 022] treating BasicCC::Set as a report 2022-11-25T00:54:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:54:52.489Z CNTRLR [Node 024] treating BasicCC::Set as a report 2022-11-25T00:55:05.657Z CNTRLR [Node 018] treating BasicCC::Set as a report 2022-11-25T00:55:12.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:55:42.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:56:12.677Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:56:42.679Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:57:12.677Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:57:42.677Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:57:42.775Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:57:42.876Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:57:42.974Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:57:58.819Z CNTRLR Failed to execute controller command after 1/3 attempts. Scheduling next try i n 100 ms. 2022-11-25T00:58:12.675Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta)

UPDATE: Also seeing these

2022-11-25T00:58:45.379Z CNTRLR [Node 054] detected a deviation of the node's clock, updating it... 2022-11-25T00:58:47.073Z CNTRLR [Node 055] detected a deviation of the node's clock, updating it... 2022-11-25T00:59:02.277Z CNTRLR [Node 104] detected a deviation of the node's clock, updating it... 2022-11-25T00:59:12.676Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T00:59:26.402Z CNTRLR [Node 090] detected a deviation of the node's clock, updating it...

I just moved my Z-Stick to a higher location to see if some of these invalid values logs reduce in number. All the same, my zwave integration has been stable for a long time prior to 2022.9.x without ever needing to relocate/redirect the antenna.

snarlingllama commented 1 year ago

@blhoward2 I know your question about the Z-Stick was not directed towards me, but here is what I have:

Aeotec Z‐Stick Gen5 Firmware: 1.2 (as reported by HA, but I believe the latest is actually v1.02)

It is on a USB extension cable. If that is a possible culprit, I do not mind trying it direct-connected to my RPi.

Chef-de-IT commented 1 year ago

@Chef-de-IT Is your stick on a usb extension cable?

Yes it is. Zip-tied to be in the clear of any metal in the immediate vicinity

What stick?

Zooz ZST10 (all of my Z-Wave devices are Z-Wave Plus S2, 700 series chip. Most are Zooz brand Zen77 dimmers.)

What fw is the stick on?

Latest from Zooz - file ZST10-700-ZW_SerialAPI_Controller_7_17_2_REGION_US.gbl (while HA reports Firmware: 7.17.1)

Have you tried a network heal?

Only once every couple of days :)

snarlingllama commented 1 year ago

Posting another update after moving my z-stick to a higher position. I do not want to invoke correlation bias, but some of the errors seem to be reduced in number.

2022-11-25T01:13:18.873Z CNTRLR [Node 017] treating BasicCC::Set as a report 2022-11-25T01:13:20.689Z CNTRLR [Node 021] treating BasicCC::Set as a report 2022-11-25T01:13:42.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:14:12.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:14:42.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:15:12.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:15:22.935Z CNTRLR « [Node 028] received wakeup notification 2022-11-25T01:15:22.937Z CNTRLR [Node 028] The node is now awake. 2022-11-25T01:15:23.941Z CNTRLR » [Node 028] Sending node back to sleep... 2022-11-25T01:15:23.976Z CNTRLR [Node 028] The node is now asleep. 2022-11-25T01:15:42.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:16:05.801Z CNTRLR [Node 017] treating BasicCC::Set as a report 2022-11-25T01:16:12.670Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:16:42.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:17:12.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:17:39.103Z CNTRLR « [Node 018] received wakeup notification 2022-11-25T01:17:39.107Z CNTRLR [Node 018] The node is now awake. 2022-11-25T01:17:39.958Z CNTRLR « [Node 020] received wakeup notification 2022-11-25T01:17:39.959Z CNTRLR [Node 020] The node is now awake. 2022-11-25T01:17:40.329Z CNTRLR » [Node 018] Sending node back to sleep... 2022-11-25T01:17:40.359Z CNTRLR [Node 018] The node is now asleep. 2022-11-25T01:17:41.084Z CNTRLR » [Node 020] Sending node back to sleep... 2022-11-25T01:17:41.110Z CNTRLR [Node 020] The node is now asleep. 2022-11-25T01:17:42.670Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:17:42.769Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:17:42.870Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:17:42.968Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:17:46.391Z CNTRLR « [Node 017] received wakeup notification 2022-11-25T01:17:46.392Z CNTRLR [Node 017] The node is now awake. 2022-11-25T01:17:47.536Z CNTRLR » [Node 017] Sending node back to sleep... 2022-11-25T01:17:47.576Z CNTRLR [Node 017] The node is now asleep. 2022-11-25T01:17:58.330Z CNTRLR [Node 021] treating BasicCC::Set as a report 2022-11-25T01:18:12.669Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:18:22.796Z CNTRLR Failed to execute controller command after 1/3 attempts. Scheduling next try i n 100 ms. 2022-11-25T01:18:42.669Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:18:47.083Z CNTRLR [Node 017] treating BasicCC::Set as a report 2022-11-25T01:18:50.365Z CNTRLR [Node 024] treating BasicCC::Set as a report 2022-11-25T01:18:50.603Z CNTRLR [Node 018] treating BasicCC::Set as a report 2022-11-25T01:18:58.104Z CNTRLR [Node 022] treating BasicCC::Set as a report 2022-11-25T01:19:12.670Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:19:42.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:20:12.671Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T01:20:25.854Z CNTRLR [Node 073] [Notification] type: System event: Heartbeat 2022-11-25T01:20:26.161Z CNTRLR « [Node 073] received wakeup notification 2022-11-25T01:20:26.163Z CNTRLR [Node 073] The node is now awake. 2022-11-25T01:20:27.167Z CNTRLR » [Node 073] Sending node back to sleep... 2022-11-25T01:20:27.202Z CNTRLR [Node 073] The node is now asleep.

I also use a Zigbee dongle, I know they do not operate on the same frequency, but I will try to further separate them (increase their separation from about 3ft to about 5ft).

Chef-de-IT commented 1 year ago

Sometimes a Z-Wave device becomes grayed out in HA UI, and if I notice it and heal that specific device from HA, it comes back just fine (no loss of entities / renaming). Note that healing specific devices with a command on screenshot from those specific Z-Wave devices' individual pages (see screenshot) works, while healing the entire Z-Wave network will NOT usually result in re-enabling of the individual Z-Wave devices that may have gotten greyed out. HA Z-Wave Device Heal screenshot

blhoward2 commented 1 year ago

Latest from Zooz - file ZST10-700-ZW_SerialAPI_Controller_7_17_2_REGION_US.gbl (while HA reports Firmware: 7.17.1)

This isn’t right. HA should report 7.17.2. Your upgrade wasn’t successful and there are major bugs with .1. Start there.

snarlingllama commented 1 year ago

Is there anyway in HA to see if there is a USB power issue? Again, trying to avoid correlation bias, but around the time I started having issues (IIRC), I just added a Conbee II. Just trying to rule out/in anything that changed around the time I recall the issue starting.

My current USB setup:

Direct to RPi4:

  1. SSD
  2. Z-Stick

Direct to power USB hub:

  1. Zigbee Conbee II
  2. Google Coral (1)
  3. Google Coral (2)
Chef-de-IT commented 1 year ago

HA should report 7.17.2. Your upgrade wasn’t successful and there are major bugs with .1. Start there.

Makes sense. Pretty sure Simplicity Studio reported a successful completion, but I've never cross-checked the reported version when the stick was back in HA - until now.

snarlingllama commented 1 year ago

I performed a reboot (this usually triggers the issue for me) and I noticed this in the logs:

2022-11-25T03:03:24.741Z CNTRLR [Node 058] Timed out while waiting for a response from the node (ZW0201) Z-Wave error ZWaveError: Cannot check for firmware updates for node 58: Failed to query firmware version from the node! (ZW0260) at ZWaveController.getAvailableFirmwareUpdates (/usr/src/node_modules/zwave-js/src/lib/controller/Controller.ts:4535:10) at Function.handle (/usr/src/node_modules/@zwave-js/server/dist/lib/controller/message_handler.js:204:30) at Client.receiveMessage (/usr/src/node_modules/@zwave-js/server/dist/lib/server.js:105:62) { code: 260, context: undefined, transactionSource: undefined } Z-Wave error ZWaveError: Node 61 does not support the Command Class Manufacturer Specific! (ZW0302) at Object.get (/usr/src/node_modules/@zwave-js/cc/src/lib/API.ts:161:13) at ZWaveController.getAvailableFirmwareUpdates (/usr/src/node_modules/zwave-js/src/lib/controller/Controller.ts:4523:5) at Function.handle (/usr/src/node_modules/@zwave-js/server/dist/lib/controller/message_handler.js:204:54) at Object.controller (/usr/src/node_modules/@zwave-js/server/dist/lib/server.js:38:103) at Client.receiveMessage (/usr/src/node_modules/@zwave-js/server/dist/lib/server.js:105:99) at WebSocket. (/usr/src/node_modules/@zwave-js/server/dist/lib/server.js:49:45) at WebSocket.emit (node:events:513:28) at WebSocket.emit (node:domain:489:12) at Receiver.receiverOnMessage (/usr/src/node_modules/ws/lib/websocket.js:1178:20) at Receiver.emit (node:events:513:28) { code: 302, context: undefined, transactionSource: undefined }

Note that "Node 61" is a device that was renamed from the name I previous specified in HA.

I reinterviewed the device and the following logs resulted:

2022-11-25T03:05:00.007Z CNTRLR » [Node 061] querying the CC version for Meter... 2022-11-25T03:05:00.391Z CNTRLR [Node 061] supports CC Meter (0x32) in version 3 2022-11-25T03:05:00.392Z CNTRLR [Node 061] skipping query for Clock (0x81) because max implemented version i s 1 2022-11-25T03:05:00.393Z CNTRLR » [Node 061] querying the CC version for Association... 2022-11-25T03:05:00.898Z CNTRLR [Node 061] supports CC Association (0x85) in version 2 2022-11-25T03:05:00.899Z CNTRLR » [Node 061] querying the CC version for Association Group Information... 2022-11-25T03:05:02.788Z CNTRLR [Node 061] supports CC Association Group Information (0x59) in version 1 2022-11-25T03:05:02.789Z CNTRLR » [Node 061] querying the CC version for Manufacturer Specific... 2022-11-25T03:05:06.505Z CNTRLR [Node 061] Timed out while waiting for a response from the node (ZW0201) 2022-11-25T03:05:06.506Z CNTRLR [Node 061] CC version query for Manufacturer Specific timed out - assuming the node supports version 1... 2022-11-25T03:05:06.507Z CNTRLR » [Node 061] querying the CC version for Firmware Update Meta Data... 2022-11-25T03:05:06.524Z CNTRLR Failed to execute controller command after 1/3 attempts. Scheduling next try i n 100 ms. 2022-11-25T03:05:07.012Z CNTRLR [Node 061] supports CC Firmware Update Meta Data (0x7a) in version 2 2022-11-25T03:05:07.013Z CNTRLR [Node 061] skipping query for Powerlevel (0x73) because max implemented vers ion is 1 2022-11-25T03:05:07.048Z CNTRLR [Node 061] Embedded device config loaded 2022-11-25T03:05:07.067Z CNTRLR [Node 061] Interviewing Z-Wave Plus Info... 2022-11-25T03:05:07.068Z CNTRLR » [Node 061] querying Z-Wave+ information... 2022-11-25T03:05:07.530Z CNTRLR « [Node 061] received response for Z-Wave+ information: Z-Wave+ version: 1 role type: AlwaysOnSlave node type: Node installer icon: 0x0700 user icon: 0x0700 2022-11-25T03:05:07.534Z CNTRLR [Node 061] Interviewing Firmware Update Meta Data... 2022-11-25T03:05:07.535Z CNTRLR » [Node 061] Querying firmware update capabilities... 2022-11-25T03:05:08.802Z CNTRLR « [Node 061] Received firmware update capabilities: firmware targets: 0 continues to function: unknown supports activation: unknown 2022-11-25T03:05:08.805Z CNTRLR [Node 061] Interviewing Association... 2022-11-25T03:05:08.806Z CNTRLR » [Node 061] querying number of association groups... 2022-11-25T03:05:12.320Z CNTRLR [Node 018] treating BasicCC::Set as a report 2022-11-25T03:05:12.678Z DRIVER dropping CC with invalid values (Reason: Unexpected meter type or corrupted da ta) 2022-11-25T03:05:12.873Z CNTRLR [Node 061] Timed out while waiting for a response from the node (ZW0201) 2022-11-25T03:05:12.873Z CNTRLR [Node 061] Querying association groups timed out, skipping interview... 2022-11-25T03:05:12.875Z CNTRLR [Node 061] Interviewing Association Group Information... 2022-11-25T03:05:12.879Z CNTRLR [Node 061] Interviewing Binary Switch... 2022-11-25T03:05:12.880Z CNTRLR » [Node 061] querying Binary Switch state... 2022-11-25T03:05:13.355Z CNTRLR « [Node 061] received Binary Switch state: current value: true 2022-11-25T03:05:13.361Z CNTRLR [Node 061] Interviewing Multilevel Switch... 2022-11-25T03:05:13.364Z CNTRLR » [Node 061] requesting current switch state... 2022-11-25T03:05:13.847Z CNTRLR [Node 061] Interviewing Color Switch... 2022-11-25T03:05:13.847Z CNTRLR » [Node 061] querying supported colors... 2022-11-25T03:05:14.340Z CNTRLR » [Node 061] received supported colors: · Red · Green · Blue 2022-11-25T03:05:14.348Z CNTRLR » [Node 061] querying current color state (Red) 2022-11-25T03:05:14.839Z CNTRLR » [Node 061] querying current color state (Green) 2022-11-25T03:05:15.584Z CNTRLR » [Node 061] querying current color state (Blue) 2022-11-25T03:05:16.085Z CNTRLR [Node 061] Interviewing Configuration... 2022-11-25T03:05:16.085Z CNTRLR [Node 061] ConfigurationCC: Loading configuration parameters from device confi g 2022-11-25T03:05:16.108Z CNTRLR » [Node 061] finding first configuration parameter... 2022-11-25T03:05:17.728Z CNTRLR [Node 061] Timed out while waiting for a response from the node (ZW0201) 2022-11-25T03:05:17.728Z CNTRLR [Node 061] Finding first configuration parameter timed out, skipping interview ... 2022-11-25T03:05:17.732Z CNTRLR [Node 061] Interviewing Meter... 2022-11-25T03:05:17.732Z CNTRLR » [Node 061] querying meter support... 2022-11-25T03:05:18.137Z CNTRLR « [Node 061] received meter support: type: Electric supported scales:
· kWh · W · V · A supported rate types: supports reset: true 2022-11-25T03:05:18.139Z CNTRLR » [Node 061] querying meter value (type = Electric, scale = kWh)... 2022-11-25T03:05:18.868Z CNTRLR » [Node 061] querying meter value (type = Electric, scale = W)... 2022-11-25T03:05:19.374Z CNTRLR » [Node 061] querying meter value (type = Electric, scale = V)... 2022-11-25T03:05:19.886Z CNTRLR » [Node 061] querying meter value (type = Electric, scale = A)... 2022-11-25T03:05:20.674Z CNTRLR [Node 061] Interviewing Clock... 2022-11-25T03:05:20.676Z CNTRLR » [Node 061] requesting current clock setting... 2022-11-25T03:05:21.442Z CNTRLR « [Node 061] received current clock setting: Saturday, 00:42 2022-11-25T03:05:21.446Z CNTRLR [Node 061] Interview stage completed: CommandClasses 2022-11-25T03:05:21.448Z CNTRLR [Node 061] Interview stage completed: OverwriteConfig 2022-11-25T03:05:21.449Z CNTRLR [Node 061] Interview completed 2022-11-25T03:05:21.450Z CNTRLR [Node 061] The node is ready to be used

Although HA now recognized the device type as an Aeotec Smart Switch, I once again lost all of my previous entity and device names for this specific node.

blhoward2 commented 1 year ago

This is really hard to read and the formatting helps. Please post an actual file with the log in it.

None of that looks peculiar to me, though. Did you lose it on the interview or during the other log? How sure are you it was that point in time?

snarlingllama commented 1 year ago

The 1st group of logs was immediately after a reboot. During a reboot, at least 1 z-wave device switches back to a generic "node" device. So this is all I have in the logs pertaining to that exact node that I confirmed was reset.

The 2nd group of logs happened during and immediately after I reinterviewed the device.

So I am absolutely certain this occurred at the exact point in time that the device was reset and rediscovered as the correct model, but never reverting back to its original name and entities names.

kpine commented 1 year ago

The driver logs are unreadable. Please attach them as a file if you're unable to get them formatted properly. Also make sure they are at the Debug level.

Pick one or more affected devices, and download the device diagnostic data (not config diagnostic) when everything is in a good state, then download the diagnostic file again when something goes wrong. Attach before and after files when it happens. Please also indicate which entity or entities have been affected.

image

Turn on HA integration debug logs, and attach them as a file (home-assistant.log) when the issue is observed. In configuration.yaml, set the following to enable logging:

logger:
  default: info
  logs:
    zwave_js_server: debug
    homeassistant.components.zwave_js: debug
snarlingllama commented 1 year ago

@kpine Thanks for the response. I will get the logger settings added in the my configuration and post what I can. The issue is that I have 100 z-wave nodes and there is no real pattern or consistency for when the issue happens; its random devices across the 100 nodes.

Providing before and after logs will likely be near impossible, but I will if I get lucky.

snarlingllama commented 1 year ago

I removed the code ticks from my previous post to hopefully make it a little easier to read. This is the closest thing to a smoking gun that I have found yet (but only because of the timing).

snarlingllama commented 1 year ago

Node 105.zip

This is not a before and after snapshot, but I found a node (Node 105) that was renamed and have several entities that appears to be from different types of sensors. I do not know which device this is yet, as the interview process has failed 4 times. Attaching what I can here.

kpine commented 1 year ago

Node 105.zip

This is not a before and after snapshot, but I found a node (Node 105) that was renamed and have several entities that appears to be from different types of sensors. I do not know which device this is yet, as the interview process has failed 4 times. Attaching what I can here.

It's your water valve. Currently the device information is limited in the UI because it doesn't get added until the node becomes ready, but it's currently in a failed state and not ready. On its own this doesn't explain the renaming problem, but maybe related.

snarlingllama commented 1 year ago

@kpine Good catch. I thought it might have been one of my smart switches. I am missing one, but cannot figure out what it got renamed to. I am working on getting the water valve to become ready (manually activating it while reinterviewing).

snarlingllama commented 1 year ago

@kpine @blhoward2 I have downloaded all 100 diagnostic files for my zwave nodes prior to them having an incident.

Its now a waiting game until one of these devices acts up. Once they do, likely within the in next hour or so, I will post the json files pre and post issue for the node(s).