ZEN30 dimmer + switch associating actions with the wrong entity in HA

PARitter commented 3 years ago

A dev told me to report this here, not an issue with Home Assistant

[ X] I am not using HomeAssistant. Or: a developer has told me to come here.
[ X] I am not using ZWaveJS2MQTT. Or: a developer has told me to come here.
[ X] I have checked the troubleshooting section and my problem is not described there.
[ X] I have read the changelog and my problem was not mentioned there.

The ZEN30 device has a dimmer-switch and a seperate relay switch. These appear as two entities in HomeAssistant - light.xxx for the dimmer and switch.xxx for the relay-switch. When controlling from HA both work perfectly. But any action on the dimmer done on the device gets repo as a change on the "switch" entity.

This was reported on the zwavejs2mqtt slack channel and user "Blowhard2" at that site said "we need to add a compat flag. We are mapping the report to the wrong endpoint". He asked me to open an issue here.

Device information

Which device(s) is/are affected (make/model)? Zooz ZEN30 dual dimmer+switch What are the node IDs?

Did you change anything?

[ ] Yes: (please describe)
[ X] No

Did this use to work before?

[ X] Don't know, this is a new device
[ ] No, it never worked anywhere
[ ] Yes, in: (specify application with versions)

How are you using node-zwave-js

[ X] zwavejs2mqtt (latest) docker image (4.0.0)
[ ] zwavejs2mqtt (dev) docker image
[ ] zwavejs2mqtt Manual Docker build
- node-zwave-js branch:
- zwavejs2mqtt branch:
[ ] ioBroker.zwave2 adapter
[ X] HomeAssistant version XYZ (HA Version 2021.4.6
[ ] Pkg
[ ] Manual Docker build
- node-zwave-js branch:
- zwavejs2mqtt branch:
[ ] node-red-contrib-zwave-js
- Module version (double click node):
[ ] Manually built (as described in the docs)
[ ] Other:

To Reproduce

Steps to reproduce the behavior:

Setup zwavejs2mqtt with HA Install/associate ZEN30 switch Test that both entities are crated in HA and from HA control the correct part of the device Turn the dimmer on or off at the device Notice that state changes on the "switch" entity and not the "light" <-- this is the symptom of the bug Turn the relay switch on or off at the device Notice that the correct entity (the switch) change is reflected in HA

Additional context

N/A

Logfile:

N/A

AlCalzone commented 3 years ago

Logfile:

N/A

I really need that.

PARitter commented 3 years ago

zwavejs_1.log

The Zen30 is Node 038 in this log file, device id 634-40968-40960 (0x027a-0xa000-0xa008.

blhoward2 commented 3 years ago

Looks like another root endpoint mapping issue. @AlCalzone If you agree, I've merged in the compat flag awhile.

2021-05-04T22:06:07.799Z CNTRLR   [Node 038] Mapping unsolicited report from root device to first supporting end
                                  point #1
2021-05-04T22:06:10.875Z CNTRLR « [Node 038] received CentralScene notification {
                                      "nodeId": 38,
                                      "ccId": "Central Scene",
                                      "ccCommand": "0x03",
                                      "payload": "0x988003"
                                  }

AlCalzone commented 3 years ago

@PARitter Please do another log with loglevel "debug". I can't tell what exactly the device sends. I'm a bit confused by the fact that the device only reports one endpoint.

PARitter commented 3 years ago

zwavejs_1.log

As requested

AlCalzone commented 3 years ago

That's not the "debug" loglevel. https://zwave-js.github.io/zwavejs2mqtt/#/troubleshooting/bug_report?id=export-logs => point 1

PARitter commented 3 years ago

zwavejs2mqtt.log

Apologies.

AlCalzone commented 3 years ago

Noo, that was the correct file, but the wrong loglevel

PARitter commented 3 years ago

Is the log file attached in my last comment sufficient? It is debug level...

PARitter commented 3 years ago

One more try with both log files. Hopefully this is what you need. Sorry for the confusion - my bad for not paying close attention to the instructions. :(

zwavejs_1.log zwavejs2mqtt.log

blhoward2 commented 3 years ago

The zwavejs_1.log is the right log file, but could you post another with the device being reinterviewed at the debug or silly level? This one shows a cache load of the device so we don't see all of the interview. Also, could you post an export of the node? Go to the node, Advanced, Export.

dozminic commented 3 years ago

I am having the same issue. I have 5 ZEN30 switches and they all occasionally revert to behaving in this manner. A re-interview corrects the problem temporarily. I attempted to capture the logs and relevant information for node 22. Please let me know if you need additional info.

zwavejs_355.log zwavejs2mqtt.log

Export of "bad" state: bad_node22_json.txt Export of "good" state, after re-interviewing: good_node22_json.txt

AlCalzone commented 3 years ago

Okay I think I know whats going on here. The device itself is a Multilevel Switch and has an endpoint that is a Binary Switch. The Multilevel Switch updates are received as Basic CC Report, which are then mapped to Multilevel Switch according to the device's type.

For some reason, this mapping seems to fail before the re-interview, which can only mean that the device type got lost. As a result, the value gets mapped to endpoint 1, where it is treated as the value for the Binary Switch. @dozminic in case that happens again, I would need to see zwave-js's cache files <homeid>.json and <homeid>.values.jsonl.

After the re-interview this seems to be fine, so I'm suspecting this is also some kind of cache corruption, especially since this affects all ZEN30's similarly.

dozminic commented 3 years ago

Here are my cache files: zwavejs2mqtt-cache.zip

I checked nodes 10, 13, and 22 and they're all having the issue again this morning (couldn't test the other two nodes). If you need me to provide the cache after a re-interview, let me know.

AlCalzone commented 3 years ago

Okay the cache seems good. Any chance you restarted the container since it last worked?

dozminic commented 3 years ago

No, it's been up and running for the past week.

AlCalzone commented 3 years ago

If you don't mind, you could re-interview and then leave the logging running. Maybe we'll see why and when the behavior changes.

dozminic commented 3 years ago

Not a problem at all. I've enabled silly level logging and set up a Node Red/MQTT notification for myself so I get notified when one of the nodes publishes to "zwave/nodeID_XX/32/1" (since I've noticed they only report a value for "32-1-currentValue" when they're misbehaving). I'll upload the logs when the issue arises again.

Thank you for all your help!

PARitter commented 3 years ago

So I collected some more logs and info as requested.

Note that I changed the file type on the json files to "txt" to make Github uploads happy...

Both log files and a dump of the device when operating both switches before doing a re-interview:

before-zwavejs_1.log befre-zwavejs2mqtt.log before-node_38.txt

Both logs files during the re-interview:

reinterview-zwavejs_1.log reinterview-zwavejs2mqtt.log

And the node export and logs files operating both switches after the re-interview:

after-node_38.txt after-zwavejs_1.log after-zwavejs2mqtt.log

After the re-interview the device is still operating incorrectly...

dozminic commented 3 years ago

An observation, just in case it is relevant. I have ZEN30 V2 switches on 2.01 firmware version. @PARitter, you appear to have V1 switches with the 1.05 firmware from Zooz (there is a 1.07 version). I don't know if you should update them or if that would change your experience.

In any case, I would think both the V1 & V2 switches could use the same config in Z-Wave JS. Zooz isn't even super clear that there are two versions, and I've ordered V2 switches only to find out that they're actually V1 switches when I connected them to my hub (they were physically identical). I have no V1 switches at this point, only V2

AlCalzone commented 3 years ago

@PARitter since you have a new log after the interview, it looks like the driver was restarted between the re-interview and the test. Can you try it again with operating the switches directly after the re-interview? Doesn't matter that it is the same logfile.

PARitter commented 3 years ago

As requested. New log files showing:

Turned on logging
operated each switch on/off twice
requested re-interview, wait until it shows complete
operated each switch twice again
turned off logging

Note: this time before the re-interview it was still behaving badly but after the re-interview the behavior appears to be correct (or as expected). This is the first time this has happened.

I also dumped the node again after the re-interview (again, changed the extension from .json to .txt to make the github uploader happy).

zwavejs_1.log zwavejs2mqtt.log node_38.txt

PARitter commented 3 years ago

Just a quick note - today, about 24 hours later, with confirmed no restarts of the zwavejs2mqtt container, the Zen30 switch is back to mis-behaving. Once again, if the dimmer is turned on, it reports that relay-switch has changed state rather than the light.

I know it was suggested that this could be an error restoring the node from cache - but nothing that would have triggered a cache-load happened.

Unfortunately logging has been turned off since the logs I posted yesterday.

bwosborne2 commented 3 years ago

I am new to zwavejs, but I know Zooz support is excellent, I believe if a North American based developer reaches out to them they may be able to provide a sample and, possibly, information on the firmware differences.

Until recently, I worked with them closely on openHAB Z-Wave integration. Several years ago I moved from HA to OH and recently moved back to HA.

blhoward2 commented 3 years ago

We work with Zooz on an ongoing basis and get samples of pretty much all of their devices. They’ve already sent me one for testing. We’re still working on this one its just a harder one to pin down that requires some debugging.

bwosborne2 commented 3 years ago

OK, I heard they were working with somebody. I am currently running HA on my test RPi to acclimate myself again. Let me know if you think I can assist. I do not currently have that device, though.

blhoward2 commented 3 years ago

Yeah, I’m their primary POC because device support issues principally lie in the driver project. If needed I connect them with the integration devs and they send them the device for testing.

bwosborne2 commented 3 years ago

Cool! I think I heard you use the openSmartHouse database as a source? I notice it has 2 Endpoints based off the device firmware.

https://opensmarthouse.org/zwavedatabase/1182

blhoward2 commented 3 years ago

We have the ability to import from OpenHAB and OZW, but frankly we find a lot of errors in those files. We’ve switched to importing and building config files directly from the Zwave Alliance database and only use OpenHAB/OZW as a reference or to fill gaps.

bwosborne2 commented 3 years ago

Accuracy can be a down side of a community maintained database, A couple of years ago I spent 4 months cleaning some of it up. Community contributions are reviewed before they are available for export, but that is a tedious task.

The Endpoints & channels are based on xml files OH generates from the device firmware NIF response. Outside of that, the user manuals are usually a source, with sometimes input from the Alliance and, occasionally, OZW.

blhoward2 commented 3 years ago

We don’t keep device polled information in the files. Endpoints get queried directly from the device on the fly. It is really just configuration parameters and if we need to pretty up the name of the association groups.

bwosborne2 commented 3 years ago

So, really al you need for a new device is its documentation.

blhoward2 commented 3 years ago

Correct, and even then we build the file semi-autonomously from zwa so its verifying the info in the file and cleaning up the parameter names and such. Modern devices will say unknown if there is no file but they should work 100% whether we have a file or not.

bass16v commented 3 years ago

Hello all I'm having the same issue but I have something to add I didn't have this problem at all up to Z-Wave JS 0.1.17 on HA. Everything was perfect. When I updated to 0.1.18 or 19 I started having the problem with the states. What I did was to use a backup to go back to 0.1.17 and everything went back to normal. The problem is that I updated my HA to the latest version and for some reason it doesn't allow me to use Z-Wave JS 0.1.17 anymore. I restore the backup with 0.1.17, the devices work OK, a couple minutes later HA updates to Z-Wave JS 0.1.21 and it gets screwed (my auto update is turned off)

My Zen30s came with 1.03 firmware and I updated them to 1.07 a couple months ago. Yesterday I tried downgrading it to 1.05 but it didn't fix the problem. I'm waiting for 1.03 firmware from Zooz just to try.

0.1.21 Pin Z-Wave JS to 7.3.0

0.1.20 Bump Z-Wave JS Server to 1.5.0 Pin Z-Wave JS to 7.2.4

0.1.19 Pin Z-Wave JS to 7.2.3 Make log level a configuration option

0.1.18 Pin Z-Wave JS to 7.2.2

0.1.17 Bump Z-Wave JS Server to 1.4.0 Pin Z-Wave JS to 7.1.1

AG-Teammate commented 3 years ago

It looks like when the issue happens the endpoint 1 is marked as supporting command class 32:

{"k":"{\"nodeId\":19,\"commandClass\":32,\"endpoint\":1,\"property\":\"currentValue\"}","v":0}
{"k":"{\"nodeId\":19,\"commandClass\":32,\"endpoint\":1,\"property\":\"targetValue\"}"}

During normal behavior there is no CC32 support on Endpoint 1.

The files uploaded by @PARitter seem to confirm this assumption. There is the following for the broken state:

{
      "id": "38-32-1-currentValue",
      "nodeId": 38,
      "commandClass": 32,
      "commandClassName": "Basic",
      "endpoint": 1,
      "property": "currentValue",
      "propertyName": "currentValue",
      "type": "number",
      "readable": true,
      "writeable": false,
      "label": "Current value",
      "stateless": false,
      "min": 0,
      "max": 99,
      "list": false,
      "value": 0,
      "isCurrentValue": true,
      "targetValue": "32-1-targetValue",
      "lastUpdate": 1620410533572,
      "newValue": 0
    },
    {
      "id": "38-32-1-targetValue",
      "nodeId": 38,
      "commandClass": 32,
      "commandClassName": "Basic",
      "endpoint": 1,
      "property": "targetValue",
      "propertyName": "targetValue",
      "type": "number",
      "readable": true,
      "writeable": true,
      "label": "Target value",
      "stateless": false,
      "min": 0,
      "max": 99,
      "list": false,
      "lastUpdate": 1620410533572
    },

This is missing in @PARitter 's normal dump.

I'm new to this project but I'm assuming this is what makes the following code fire and the issue to occur:

                if (endpoint.index === 0) continue;
                if (!endpoint.supportsCC(command.ccId)) continue;
                // Force the CC to store its values again under the supporting endpoint
                this.driver.controllerLog.logNode(
                    this.nodeId,
                    `Mapping unsolicited report from root device to first supporting endpoint #${endpoint.index}`,
                );
                command.endpointIndex = endpoint.index;
                command.persistValues();

It's not immediately clear what exactly causes CC32 to be added to the Endpoint 1 config, I'm currently running logs on silly in hope to catch the exact moment.

blhoward2 commented 3 years ago

That is our mapping code which causes the effect after the device type is lost. It isn't the issue here or we'd undo it with a compat flag, it just makes the problem apparent.

AlCalzone commented 3 years ago

@AG-Teammate the value you are seeing appears because the report is mapped to endpoint 1 instead of staying on endpoint 0 and being treated as Binary Switch there. I'm not sure yet why that happens, but it very likely has to do with the support for the Basic CC being added to endpoint 1 for some reason.

AG-Teammate commented 3 years ago

Thanks guys. For me the issue did not appear overnight but I then found a reliable repro case.

Basically, restarting the zwavejs2mqtt container (within HA) messes up the device. The following appears immediately (without even using the device) in the debug view:

 {
      "id": "19-32-1-currentValue",
      "nodeId": 19,
      "commandClass": 32,
      "commandClassName": "Basic",
      "endpoint": 1,
      "property": "currentValue",
      "propertyName": "currentValue",
      "type": "number",
      "readable": true,
      "writeable": false,
      "label": "Current value",
      "stateless": false,
      "min": 0,
      "max": 99,
      "list": false,
      "value": 0,
      "isCurrentValue": true,
      "targetValue": "32-1-targetValue",
      "lastUpdate": 1621350244689,
      "newValue": 0
    },
    {
      "id": "19-32-1-targetValue",
      "nodeId": 19,
      "commandClass": 32,
      "commandClassName": "Basic",
      "endpoint": 1,
      "property": "targetValue",
      "propertyName": "targetValue",
      "type": "number",
      "readable": true,
      "writeable": true,
      "label": "Target value",
      "stateless": false,
      "min": 0,
      "max": 99,
      "list": false,
      "lastUpdate": 1621350185218
    },

The logs do not seem to contain much except for the following:

2021-05-18 10:09:11.230 INFO ZWAVE: Node 19: value added 19-32-1-currentValue => undefined
2021-05-18 10:09:11.233 INFO ZWAVE: Node 19: value added 19-32-1-targetValue => undefined

After re-interview the device is back to normal until next restart. The {homeid}.json cache file does not differ.

I'm attaching logs after restart (re-interview -> restart -> no usage of device; ZEN30 is on Node 19):

cache_after_restart.txt cache_after_reinterview_before_restart.txt docker_after_restart_no_usage.txt silly_after_restart_no_usage.log

I'm guessing there is another case of misbehavior after certain time but simply restarting container reproduces the issue in my case.

Overall, this feels like incorrect data loading from cache file (or incorrect saving in the first place?).

@blhoward2 @AlCalzone is this useful at all? I can setup a debugging environment over weekend if necessary (hard to investigate with HA container and just looking at the code)

AG-Teammate commented 3 years ago

Quick update - the issue did not appear for 2 days until auto-update of the zwavejs2mqtt in HA and following restart. Makes me wonder if this happens only during restart.

dozminic commented 3 years ago

That could be the case. However, when I originally commented in this issue, my container had been running for about a week. It's been running for almost 2 weeks since then and I haven't seen the issue again. It's just strange to me that I didn't have a restart and encountered the problem.

AlCalzone commented 3 years ago

Found it. That took a lot of semi-interactive debugging over the pond 😅

AG-Teammate commented 3 years ago

@AlCalzone Awesome work!

I just setup a local debugging environment and can confirm .getEndpoint(1).supportsCC(CommandClasses.Basic) returns false for your change and true for master branch after restart.

AlCalzone commented 3 years ago

I'll wait for @blhoward2 to confirm too, then I'm merging.

AG-Teammate commented 3 years ago

@dozminic

That could be the case. However, when I originally commented in this issue, my container had been running for about a week. It's been running for almost 2 weeks since then and I haven't seen the issue again. It's just strange to me that I didn't have a restart and encountered the problem.

I had this happen today after disabling logging in the zwave2jsmqtt web interface. I guess this restarts underlying library without restarting the whole container. Perhaps this could explain your experience.

zwave-js / node-zwave-js

ZEN30 dimmer + switch associating actions with the wrong entity in HA #2536