ZWave spuriously creates new entities

brettonw commented 2 years ago

The problem

The ZWave integration sometimes creates new entities on devices for features that belong to another device. I periodically go into HA and disable these spuriously created entities, but it feels like a pretty classic memory corruption problem that should be fixed.

For example, with an environment sensor (temperature, illuminance, and humidity) and a light switch, HA will suddenly create a light switch humidity sensor (and then mark it as unavailable).

I have not been able to isolate a reason for this happening, but I can't be alone. It's been consistent across the last several months (using an Aeotec ZST10-700 controller).

What version of Home Assistant Core has the issue?

2022.8.7

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

ZWave

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zwave_js/

Diagnostics information

It is not linked to a single device. What type of logs can I provide for help?

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Not really

Additional information

No response

probot-home-assistant[bot] commented 2 years ago

Hey there @home-assistant/z-wave, mind taking a look at this issue as it has been labeled with an integration (zwave_js) you are listed as a code owner for? Thanks! _{^{(message by CodeOwnersMention)}}

zwave_js documentation zwave_js source _{^{(message by IssueLinks)}}

brettonw commented 2 years ago

Aeotec Zstick firmware v.7.17.2

raman325 commented 2 years ago

please share a diagnostics dump of the device that you are facing this issue with

brettonw commented 2 years ago

I have 30 Z-wave devices and multiple devices exhibit the problem. Mix and match features show up in many ZWave devices. For a very long time I thought some of the devices just happened to also have temperature sensors or power meters built in, but they don't.

The zwave diagnostic file for one device showing the problem (the upstairs hall env - an Aeotec ZWA024 Multisensor 7) is attached: sensor.upstairs_hall_env_electric_consumption_w is a bogus entity on the device that I disabled.

zwave_js-7cbdd2e50b665e62868eb0a2bd3a6338-Upstairs Hall Env-960c95410ed8888ce9159ded8906dfd3.json.txt

raman325 commented 2 years ago

this is indeed strange. Are they different device types or the same? Have you been reinterviewing the devices?

brettonw commented 2 years ago

I'm a fairly active user of Z-wave, so I have certainly re-interviewed devices, removed devices, re-added devices, and on like that. I prefer to use the ZWaveJS web interface for that.

The problem shows up for all device types, not limited to any single class. It's most apparent when the spurious entity is obviously not part of the device, like temperature on a light switch. I've never noticed a control show up that didn't belong, just sensors.

Thinking back on it, this has been a consistent aspect since I first started with a Z-stick 5 almost a year ago and added a heavy duty switch to my water heater to monitor power consumption - it showed up in HA with a temperature sensor that I used for a few days. It wasn't until I realized it never changed value that I figured out it was a bogus entity.

A few months ago, I added an automation that notifies me of unavailable entities, and now every few days I notice a new entity that I go disable.

I do a lot of custom configurations, so I reboot my HA a lot. I've sort of wondered if it happens then, but it's not consistent - as in I don't get a new entity every time I reboot or something like that.

raman325 commented 2 years ago

the way we create entities is that we parse the values that zwave-js has discovered on the device and see if they match any of our discovery rules. If they do, we pass that information along to the corresponding platform to create an entity to control/read that value. Typically when people get phantom entities, it's because during the interview process, the device mistakenly advertised a value that it doesn't actually have. Given that this is happening across your network though, I have a hard time believing that is the issue. I just also don't see how we would discover these phantom entities on our own without some sort of indicator from zwave-js

brettonw commented 2 years ago

You're saying it's probably a problem in ZWave-js?

raman325 commented 2 years ago

not necessarily, just writing out my thoughts as I try to reason through how to troubleshoot this. How are you running the zwave-js server?

brettonw commented 2 years ago

It’s running in a container on a wired raspberry pi, separate from the HA container. The z-stick is at the end of a 8’ long usb cord, and is centrally located in the home.

AlCalzone commented 2 years ago

Please share driver logs (loglevel debug) of a re-interview of such a device. And mention what is now there that shouldn't. We should then be able to see why the entities are there.

brettonw commented 2 years ago

zwavejs_current.log

Alarm and Illuminance are not elements of this z-wave device (a light switch).

AlCalzone commented 2 years ago

I don't see them in the interview log, but on the other hand that interview doesn't look very successful either. 19 timeouts, repeated command failures, incredibly long communication attempts (note that -95 dBm is close to background noise):

transmit status:        OK, took 10790 ms
routing attempts:       23
protocol & route speed: Z-Wave, 100 kbit/s
ACK RSSI:               -95 dBm

You definitely have some RF issues, is your stick on an extension cord? If not, change that.

Those RF issues can also explain some weird entities caused by flipped but undetected bits in the messages.

kurtzettel commented 2 years ago

I have the HUSBZB-1 and I have seen similar behavior for the last few months. Mostly voltage and watt usage entities being added to things like flood sensors or switches that don't have that capability. I don't have any technical details to add except to say that you aren't crazy.

AlCalzone commented 2 years ago

Like I said, it can happen because of garbled RF messages. These are filtered out on devices that have the capability to report which sensors they support. But some older devices don't have this, so zwave-js cannot know that the device does not have a flood sensor built in for example.

TheRealWaldo commented 2 years ago

I've started to see this as well.

I have had zwave devices pretty much since their inception and have never seen this behaviour on any other system.

Also have never seen this with HA and the exact same hardware until I migrated to zwavejs.

Currently using the native zwave-js integration and add-on on an i7 nuc with hassos and a Zooz S2 stick.

Only fix seems to be an exclusion and re-add, but with several hundred devices this is getting to be a very large chore.

I should add, it's a temporary fix. The phantom entities randomly reappear.

taylorbohannan commented 1 year ago

Just wanted to add that I too am seeing "phantom" entities created and associated to incorrect z-wave devices.

For example, a Jasco in-wall motion dimmer will have an "Electric Production [Power Factor]" entity created associated to it. This device doesnt even support Meter v3 CC (command class).

What's odd is I have other devices with this type of entity that actually do support the Meter v3 CC and report energy usage (Jasco Outdoor 40-amp Switch).

I've noticed the "phantom" entities that get created are usually associated with devices that have built-in motion sensors or energy usage reporting. The other type of entity I sometimes see is a phantom switch associated to an in-wall motion dimmer that already has a light entity.

I have about 120 devices on my z-wave network. About 115 of my devices are made by Jasco and consist of 6 different models. In-Wall Switches, In-Wall Dimmers, In-Wall Motion Switches, In-Wall Motion Dimmers, Outdoor Plug-in Switches, and Outdoor 40-amp Switches. The other devices are Aeotec Repeater 7's and a USB Z-Stick 7 (v7.17.2).

I've not been able to find an entry in zwave-js logs indicating a new entity was discovered, nor do I see anything in home assistant's log about a new entity. I'm fairly certain the zwavejs integration is creating the entity in home assistant, but which logs should I be looking at? Home Assistant or zwavejs2mqtt?

raman325 commented 1 year ago

you should look at the zwavejs logs. What's happening behind the scenes is that zwave-js is reporting that these zwave values exist, and using our discovery rules, HA creates new entities for them. The underlying issue is with zwave-js, and I believe the reasoning that @AlCalzone has given before is that it is likely an RF issue between the controller and the node. I am not sure how to address this without either moving the device closer or making your network more dense.

raman325 commented 1 year ago

see here: https://github.com/home-assistant/core/issues/77966#issuecomment-1241188050

brettonw commented 1 year ago

I didn't respond to #77966 at the time because it just felt like a denial of the problem, and it still sounds that way. Something else was wrong in that log because the device in question has a solid radio connection to the z-stick, and it seems like every device in my house has at some time or another gotten a spurious entity. I now have automations that alert me to them, and I get notification so frequently that removing them is part of my daily maintenance chore.

I do not think it is as simple as just a RF issue. I would expect a lot more random data if that was the problem. What everybody seems to be reporting is that entities associated with OTHER devices in their home are showing up. For instance, I have devices that record power usage, and power usage entities keep showing up on devices that don't do that. What I haven't seen is an entity for a device I DON'T have, like water detection, or I don't even know what else is possible. It looks a lot like a scrambled indexing problem. In C days, I would have said it's a swapped pointer problem or something where the code failed to properly clear and dispose memory.

That said, if it's not an HA problem, can HA broker a conversation with the zwave-js folks, or to the underlying hardware folks to make it sound like this problem will get solved? Without a fix, and in conjunction with the 700 series hardware bugs (that are "mostly" fixed), Z-Wave is just another broken protocol that never quite works right, and that's just not acceptable as a long-term home automation strategy.

taylorbohannan commented 1 year ago

I am having this exact same problem. I have a z-wave network of about 120 devices. Nearly every device (~110) is manufactured by GE / Jasco and consists of 6 models (Dimmer, Dimmer + Motion, Switch, Switch + Motion, Outdoor Plug-in Switch, Outdoor 40-amp Switch w/ energy reporting). The rest of my devices consist of 4x Aeotec 700 Repeaters, 2x Yale Keypad Door Locks, and 1x Aeotec Door/Window 7 Contact Sensor.

For me, these phantom devices started appearing right after I added (6x) GE / Jasco 40-amp Outdoor Switches w/ Energy Reporting (Model 14285). These are the only z-wave devices with energy reporting capabilities on my z-wave network. I now have phantom energy-related entities being added to GE / Jasco In-wall switches, dimmers, motion switches, and motion dimmers.

I will typically observe, at random, a “power factor” sensor or some other energy reporting related sensor added to an existing z-wave device that’s been on my network for a very long time. I’ve been able to successfully re-interview many of these devices, but no matter what I try, these phantom sensors always seem to re-appear at random. They immediately are marked as unavailable and I have to go in and manually delete them.

The most recent devices I added to my network are the (6x) GE / Jasco 40-amp Outdoor Switches and that’s when these problems began.

I have also experienced, only a few times, z-wave devices completely disappearing from the controller. The devices still work fine. If a group association was added directly between 2 devices before one of them disappeared I’m even able to control the missing device from the grouped switch. For example, 2 in-wall switches that turn each other on/off (not referring to an add-on switch). This requires a network wide exclusion and inclusion, which gives the device a new node ID, in order to get the node paired to the controller again.

What information can I provide to help troubleshoot? Why is one node able to create / add sensors to another node on the network? Is there a way to prevent a node from creating / adding capabilities after its been successfully interviewed when it was first included in the z-wave network?

Sorry for the long post, but this is the only thread I have found with my EXACT problem — same number of nodes, manufacturer, and model of devices.

taylorbohannan commented 1 year ago

Additionally, I see these phantom devices appear in the zwave-js UI add-on I’m using in Home Assistant. So it seems HA is adding the phantom entities / sensors because zwave-js is reporting they now exist. I usually have to re-interview the node that has the phantom sensors to prevent them from being added again in HA.

brettonw commented 1 year ago

I find it happens every time I restart HA. You can go into the integration and then to the specific device to find phantom entities that the integration is no longer supplying and delete them from there.

raman325 commented 1 year ago

the zwave_js integration doesn't create values in zwave-js it consumes them - it can issue commands to the devices but I believe if a command was issued for an invalid value then zwave-js would throw an error, not create a new one that would reflect back into zwave-js-ui, right @AlCalzone ? zwave-js-ui gets its values directly from zwave-js, HA has nothing to do with that (unless somehow something we are doing is creating phantom values upstream, but my understanding of how the integration works is that is not possible for this particular instance). Additionally, we solely read values from sensors (there are no commands to issue for them) so I really can't see how this could be caused by the integration.

Here are a couple of things you can try:

Disable the zwave_js integration and monitor zwave-js-ui. If the phantom devices return then you can eliminate HA as the source.
Assuming the phantom devices stop appearing (e.g. test 1 fails), reenable the integration and set debug log levels on the following components and wait for it to happen again: homeassistant.components.zwave_js zwave_js_server

Now wait until you detect a new entity and then send the debug results. Additionally it would help to have the zwave-js logs in debug mode.

raman325 commented 1 year ago

That said, if it's not an HA problem, can HA broker a conversation with the zwave-js folks, or to the underlying hardware folks to make it sound like this problem will get solved? Without a fix, and in conjunction with the 700 series hardware bugs (that are "mostly" fixed), Z-Wave is just another broken protocol that never quite works right, and that's just not acceptable as a long-term home automation strategy.

@AlCalzone is the zwave-js folks hence him being in this thread 🙂 . I'm not sure what contacts he has with the hardware folks (Silicon Labs) but he can weigh in on that.

I didn't respond to https://github.com/home-assistant/core/issues/77966 at the time because it just felt like a denial of the problem, and it still sounds that way.

A denial of the problem would be "you're crazy, it's a you problem," which I don't believe anyone has said. I have walked through my logic as to why I don't think this is an HA issue, and I am waiting for more data to prove otherwise. Additionally, @AlCalzone, as the maintainer of zwave-js, is the best person to diagnose an issue like this on the driver side, and also understands the logs the best.

AlCalzone commented 1 year ago

z-wave devices completely disappearing from the controller

If you were to capture a driver log on loglevel debug when this happens, I could take a look. If nodes randomly disappear, either the controller has a severe memory corruption issue, the missing nodes sent a Device Reset Locally Notification (i.e. after factory reset), or some other command gets interpreted as this command. This isn't entirely unlikely since you seem to have a lot of these garbled but valid commands.

I should probably document the problem somewhere easily found...

Z-Wave frames can roughly be seen as a sequence of bytes, followed by a checksum. For 40 and 9.6 kbps transmission speeds, this "checksum" is simply all payload bytes XORed together. Only 100 kbps uses a proper CRC16 to detect errors. The issue with this XOR-checksum is that it does not detect if an even number of bits flip in the same location across multiple bytes. For example, this (example) frame

01010101 11111111 10001000

has the same checksum as this frame

11010111 01111111 10001010
^--------^----------------- flipped!
      ^-----------------^-- flipped!

which means they both get accepted by the stick, although the 2nd one is garbage. Another possibility is that the checksum itself gets changed "on air".

Now this doesn't happen very often, since it is not very likely that the same bits flip in multiple bytes, except in noisy environments where errors like this are so frequent that some do get through.

Now what can happen as a result when these seemingly correct packets are accepted?

The sending node ID is incorrect, attributing commands to the wrong node
The Command Class (CC) gets changed (e.g. Central Scene 0x5b -> Device Reset Locally 0x5a)
The Command Class's Command gets changed (e.g. SET instead of GET)
A Meter/Sensor type or scale or encoding gets changed (e.g. W -> kWh, or Brightness -> Water consumption, or 10.01 -> 1001)
etc...

Many of these are detected as invalid commands by Z-Wave JS already, for example:

CC or CC command is something unknown now.
CC or CC Command is changed so the payload is too short or incompatible with what's expected
A Meter/Sensor report does not match what the device reported as supported during the interview

Unfortunately, not everything can be detected as incorrect:

The CC was changed so the payload is now too long, but seemingly compatible with the expected format. This is necessary to support when a node uses a newer CC version than what's implemented in Z-Wave JS.
The sending node supports an older version of the Meter or Multilevel Sensor CC, where it is not possible to query which meters/sensors it supports. So we just have to accept them all, or we'd potentially discard valid readings.
Meter/Sensor readings with nonsensical values. A human easily sees that a value cannot be correct, but the driver would have to know more meta information about a device to be able to do that.

TL;DR: Okay, so what can be done about this:

USB stick on an extension cable, away from metallic surfaces, to reduce interference near the stick
Check the background noise in your network, near the controller and reporting nodes. If that's high, find countermeasures. To be fair I'm not sure what's the simplest way to do that right now though.
Reduce the reporting frequency of devices. Less noise and less packets that can be corrupted.
Use encryption. I don't recommend Security S0, but as a last measure this can be done. If Security S2 is supported, use that. Both make sure that the payload is unchanged, and since the encryption is node-specific this will also prevent commands being attributed to other nodes, simply because it cannot be decrypted in that case.
As an alternative to encryption, configure nodes to use the CRC16 CC for transmitting (if possible). That adds an additional checksum which should spot errors inside the payload. Won't detect cases where only the node ID and other protocol bytes get scrambled though.

I've also got a few more ideas what can be done on the driver side to make this a bit better.

brettonw commented 1 year ago

Thank you for the very thoughtful and detailed response about what goes wrong.

FWIW, I have 30+ devices on my Z-wave network, and a box of 30+ more devices that I have not added because I've concluded this is not a viable solution. I'm stubborn, though, so I keep coming back to it because of sunk cost in money and time.

Re: Logging in debug mode - I'd like to refer you to an excellent response on the forums summing up some of the problems:

TL/DR: This debugging solution is inadequate.

https://community.home-assistant.io/t/automate-zwavejs-ping-dead-nodes/374307/106

Re: 1 - Z-700 stick is at the end of a 10 foot cable completely away from the electronics. This is, BTW, an absolutely heinous requirement. Wifey most definitely does NOT approve of the decor.

Re: 2 - One "feature" I've noticed is that as some Z-wave devices experience dying batteries, they become screamers, effectively jamming the network. When I go downstairs and the hall light doesn't come on, my first course of action is to run around the house replacing batteries, despite all the devices reporting 100% battery. This occurs about once per month, and it seems to take the network a day or two to recover.

Re: 3 - I will absolutely try this.

Re: 4 and 5 - I see how this will solve the message quality issues, and I recognize the need for the encryption. But... this involves removing and re-adding devices, using a workflow that I've frankly found to be non-functional - I don't have the eyesight to read a tiny little number any more, and the QR codes don't work at all. If I manage to get a device to add with security, it simply doesn't work... but with that cynicism in mind, I'll try it again.

raman325 commented 1 year ago

Re: Logging in debug mode - I'd like to refer you to an excellent response on the forums summing up some of the problems:

TL/DR: This debugging solution is inadequate.

https://community.home-assistant.io/t/automate-zwavejs-ping-dead-nodes/374307/106

I read the post and I'm not sure what to tell you. Responding step by step to that post using your format:

How do you debug something if you can't look at the logs to see what happened, logs which are specifically instrumented to help diagnose an issue? You could enable debug by default, and you are welcome to, that's often how I have my instance running, but it generates a lot of noise and doesn't help 99% of users which is why they are disabled by default. Additionally, while it can be annoying to have an issue pop up and then having to enable logs and wait for the issue to reoccur, by your own accounts this happens often enough that you should be able to reproduce it fairly quickly.
This is a fair complaint about getting the driver logs, but has nothing to do with getting the integration or library logs. That's specifically what I asked for, mentioning the driver logs as being helpful but not necessary to validate that HA is not the source of the problem
You literally have the main developer for the integration (and the websocket server that connects the driver to HA) and the main developer for the driver trying to help you and asking for information. It's safe to say that we can probably read the logs we instrumented. What more do you want?

I find your tone in this conversation incredibly off-putting, and it's been a waste of energy to convince you why what I am asking for you is a valid request or what you have been told is a valid response. If you want help and can fulfill the requests to help us help you then great, we can continue this conversation. If you continue to argue about the validity of this conversation and waste our time and energy, I will be forced to lock this thread, which is unfortunate because if you would only do the thing we are asking you to do we might not only be able to help you but also help some of the other users in this thread.

brettonw commented 1 year ago

I apologize, it is not my intention to put you off. I am grateful you and others worked hard to build functional interfaces so people like me can build automations from it, and you are trying to support it well.

As a user, I encounter this problem because I have a notification configured when a node goes dead and needs to be pinged, which is a persistent and pervasive problem that really needs to be solved. The same notification picks up the spurious entities created after a HA restart, requiring me to manually delete the entry in the Z-wave device.

As an engineer, the spurious addition of devices this ticket is about is extremely suspect. It means there is a really big problem somewhere - @AlCalzone's detailed explanation of the possible sources of error and the possible solutions is fantastic and I respect his authority on the topic.

However, I am an unhappy user and want to convey the frustration I have when the response to a problem report is a request to do deep dives into the Z-wave technology and device configuration, spending many hours collecting data and rebuilding my z-wave network.

I don't think you should accept any of @AlCalzone's listed situations as status quo, and I do think a better solution to diagnosing the problem is needed than creating logs and rebuilding the network.

I agree with @daphatty's post, in that I do not feel the diagnostic tools available to me are sufficient to diagnose which layer of the technology stack is going wrong. I know how to turn on logs, but I don't know how to a-priori capture a useful log to you for whichever random device is going to go wrong, nor do I feel certain that if I did capture a log it would tell you something useful enough to find a real solution. I know I'll spend a lot of time gathering this data, so I'm looking to you to find better answers to these questions.

P.S. I reviewed the steps to reduce the number of messages in the network, and I will reduce traffic by eliminating messages I don't need. Hopefully that will reduce a source of noise that can cause this problem. It might take a few weeks to get it done and get an answer, though.

brettonw commented 1 year ago

Yesterday, I followed the descriptions in Improving the network health. I reviewed all configuration settings for 30 devices representing 504 entities in HA. Where feasible, I removed the nodes, and tried to re-add them with security. Despite repeated attempts at 11 devices that say they should support security, only 2 nodes were successfully added with security (Z-wave door locks). I made sure the 16-bit CRC encapsulation was enabled (only one device, a power meter, supports this). I trimmed unnecessary voltage and current reporting from 7 devices, and increased the change thresholds. I did not deal with 2 door sensors that have to be removed from the door to get to the secret key information, but these are both pretty close to the Z-stick and don't usually have any problems. The remaining light switches, garage door sensor, and sensors were unchanged.

FTR, I spend almost 5 hours doing this. I say this not to complain, but to make you aware of the time commitment needed to execute the diagnostic steps presented. Most of the time was removing the nodes, trying to re-add with security, removing again when that failed, and repeating several times until it was clear the node wouldn't be added with security.

I have not noticed any changes to my network behavior. I am still having nodes go dead and need to be pinged, and if I reboot the Home Assistant instance, I still get spurious entities. I have noted these entities never seem to happen on the power meter that has 16-bit CRC encapsulation.

AlCalzone commented 1 year ago

I am still having nodes go dead and need to be pinged, and if I reboot the Home Assistant instance, I still get spurious entities.

I suggest you use Z-Wave JS UI (if you don't do that already), and enable writing driver logs to file (log level debug). That way you'll have logs of when your issues happen, so I can help you with them in a more targeted manner.

Note that some of the problems you describe may still indicate physical RF issues, like noise, interference, low range, suboptimal mesh, etc. which the host software unfortunately can't do anything about.

raman325 commented 1 year ago

@brettonw if this was a wide spread problem then it would be easier to point to the issue, but it's clear that this is unique to a small subset of users, and yes, unfortunately unless we can recreate the environment it requires some additional work on your part to help us track things down.

With that being said, I still have yet to see integration logs which I have asked for multiple times, particularly after you suggested it was the integration creating the problem. I thought it required some service calls/config changes to set the log levels to debug, but it's actually simpler than that. Go to the Devices & Services section of the settings, find Z-Wave JS, and then in the kabob/hamburger menu click enable debug logging. Wait until you see a new entity, then go back and click disable. You will automatically download the logs, which you can then send to us.

If you aren't able, or don't want to do this, then we can close this issue and you can reopen it in the zwave-js/node-zwave-js project since it would be clear you are not interested in troubleshooting this from the integrations perspective

PS - appreciate the apology, it's just really frustrating to spend this much time writing responses (and to watch you write your thoughtful responses) when there are lower effort ways to get help (hint hint, logs)

raman325 commented 1 year ago

one other thing. We are consumers of the Z-Wave protocols, not the arbiters of it, so conflating the issues between our implementations and the protocol (or the hardware) is not helpful. It's well understood at this point that protocols like Z-Wave and Zigbee have limitations, in large part (in my opinion) due to the fact that they encapsulate multiple layers of the OCI stack below the application layers. This is where protocols like Matter are supposed to help, because they don't try to deal with the lower level layers and leave them to more battle tested solutions, although we will have to wait to see how that goes.

raman325 commented 1 year ago

last thought - this conversation inspired me to open this PR: https://github.com/home-assistant/home-assistant.io/pull/26435

node and controller statistics may reveal a problem that we can be more targeted about trying to troubleshoot

brettonw commented 1 year ago

I am still having nodes go dead and need to be pinged, and if I reboot the Home Assistant instance, I still get spurious entities.

I suggest you use Z-Wave JS UI (if you don't do that already), and enable writing driver logs to file (log level debug). That way you'll have logs of when your issues happen, so I can help you with them in a more targeted manner.

I have enabled the logs - sorry to take so long. I tried to update the ZWave JS UI installation yesterday, but had to roll back as that update killed the whole network with repeated reboots. I'm certain I can capture this behavior within a day. Is there some way I can identify what logs to get for you, or do you want everything?

Note that some of the problems you describe may still indicate physical RF issues, like noise, interference, low range, suboptimal mesh, etc. which the host software unfortunately can't do anything about.

That may well be true, but ... can there be diagnostics in the UI to help suss that out? I would really like to know what nodes are not communicating well so I can decide what to do about it (I do not fully comprehend the health check in the graph view, it seems to require me to know a-priori what node any given node is communicating with).

brettonw commented 1 year ago

With that being said, I still have yet to see integration logs which I have asked for multiple times, particularly after you suggested it was the integration creating the problem. I thought it required some service calls/config changes to set the log levels to debug, but it's actually simpler than that. Go to the Devices & Services section of the settings, find Z-Wave JS, and then in the kabob/hamburger menu click enable debug logging. Wait until you see a new entity, then go back and click disable. You will automatically download the logs, which you can then send to us.

I have enabled these logs and will send them on once I capture an event.

brettonw commented 1 year ago

FWIW - I have not had this happen since updating to ZWave JS UI 10.11.1 and enabling logs.

brettonw commented 1 year ago

OK - it happened twice today, but I didn't get logs (I can't explain why the logs were not enabled, maybe I didn't correctly save the configuration after turning the logs on?).

I downloaded HA diagnostics on the device that it happened to, and it reports RSSI of -86dBm, which indicates the problem is probably due to corruption from low signal, per previous observations. The diagnostic (attached) also contains a valid link to the device feature database, so I wonder why that wasn't used to constrain the available entities.

I was very surprised by this signal quality, though. The device in question is < 20 feet straight line from the Z-stick. I clicked around on all the devices in the ZWave JS UI to try to get a sense of the overall signal quality. The best signal is a device that is 16" from the Z-stick, and it is at -64dBm, which is ok, but absolutely nothing else is better than -85dBm. Devices that are nearly 50 feet away and through walls and floors are at exactly the same -86dBm RSSI. More than half the network is worse than -90dBm, including devices that are < 10 feet from the Z-stick. I'd almost say the values are random, as they don't seem to have any correlation to distance or occlusion.

I'm going to have to follow up with Aeotec on this. The Z-stick is on the end of a 10' usb cable and the cable goes into a shielded network cabinet.

I would like a general signal strength indicator as a debugging tool. Is it possible to have an entity imported that reports this value, so it can be tracked and graphed in HA?

zwave_js-7cbdd2e50b665e62868eb0a2bd3a6338-Bedroom Repeater Plug-c54be00e6574802f11f5a4604eb6f024.json.txt

AlCalzone commented 1 year ago

I wonder why that wasn't used to constrain the available entities.

I can answer that: We don't hardcode this because it would be a maintenance nightmare (see OZW). The Z-Wave standard allows for discovering almost everything about devices, so we do that instead.

signal strength indicator as a debugging tool

They are a bit hidden, but node statistics (including RSSI if known) are there, behind the ... dropdown in device info: grafik

No sure how to use this information elsewhere though.

I know the EU version of the Z-Stick 7 has severe RF performance problems, it looks like you're experiencing the same in US too (you're located in US, right?). You may just want to pick up a Zooz ZSt10-700 and try to see if that solves your issues. Migration is easy using Z-Wave JS UI -> NVM backup on the old stick, NVM restore on the new stick.

brettonw commented 1 year ago

They are a bit hidden, but node statistics (including RSSI if known) are there, behind the ... dropdown in device info: ... No sure how to use this information elsewhere though.

Can I access them in a graph setup? I suspect that some devices in my home generate a lot of noise, and I'd love to see a plot of the RSSI taken every 5 minutes cross correlated with when those devices are on or having low battery.

Zooz ZSt10-700

If that's the stick you recommend, I'm on it.

AlCalzone commented 1 year ago

Can I access them in a graph setup?

You may want to ask that on the HA Discord, I really don't know.

brettonw commented 1 year ago

just for anybody else looking for the answer to my question about graphing node statistics:

template:
  - unique_id: node_27_statistics
    trigger:
      - platform: zwave_js.event
        entity_id:
          - sensor.noe_27_node_status
        event_source: node
        event: "statistics updated"
    sensor:
      - name: "Node 27 RSSI"
        unique_id: node_27_rssi
        state: '{{ trigger.event_data.statistics.rssi }}'
        unit_of_measurement: dBm
        state_class: measurement
        device_class: signal_strength
      - name: "Node 27 RTT"
        unique_id: node_27_rtt
        state: '{{ trigger.event_data.statistics.rtt }}'
        unit_of_measurement: ms
        state_class: measurement
        device_class: duration

brettonw commented 1 year ago

The Zooz stick arrived today and I followed the steps to back up the NVM and restore it. Home Assistant took the liberty of renaming many of my entities, but I didn't have to go around and re-add everything to my Z-wave network. Took about 2 hours to walk through all of it, and everything is mostly working now. The reported RSSI values are certainly much better, though there is still some weirdness that I have to keep an eye on. I will report back as the new stick settles in. (Off the bat, there are still a number of devices with bogus entities - but they aren't in an unknown state, so the integration still thinks they are valid).

raman325 commented 1 year ago

FYI statistics are now available as entities so they can be easily graphed

issue-triage-workflows[bot] commented 1 year ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

emphaticsunshine commented 1 year ago

Any update on this? I am seeing this issue too.

brettonw commented 1 year ago

No. They essentially said this is a hardware, signal, and protocol problem that cannot be fixed from the software interfaces. Where possible, join devices using S2 security or checksums (older devices) to provide message integrity checking - it will mitigate the problem.On Oct 2, 2023, at 11:41 AM, Mohit Seth @.***> wrote: Any update on this? I am seeing this issue too.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

AlCalzone commented 1 year ago

Correct. We can avoid some of those, but not all of them.

https://zwave-js.github.io/node-zwave-js/#/troubleshooting/nonsensical-values

home-assistant / core