zwave-js / node-zwave-js

Z-Wave driver written entirely in JavaScript/TypeScript
https://zwave-js.github.io/node-zwave-js/
MIT License
749 stars 598 forks source link

🚧 META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 #3906

Closed AlCalzone closed 2 years ago

AlCalzone commented 2 years ago

It seems that 700-series sticks (including the currently latest firmware 7.17) have some problems which mostly appear on networks that:

When lots of reports reach the controller in a short time, the 700-series sticks are not able to send any message. It looks like the stick is somehow blocked and simply doesn’t send anything, maybe not even the protocol level acknowledgements for receiving the messages, causing end nodes to repeat their messages over and over, making the situation even worse.

👷🏻‍♂️ EDIT: Fix available, see below for direct links to the updated firmware 7.17.2

🔥 Bug in NVM conversion routine, potentially causing connectivity issues. Details see below

🗳 If you've updated, please take part in the survey so we can see if the update helps.


We believe the following symptoms are all caused by this:


Additional background info: https://forums.homeseer.com/forum/homeseer-products-services/homeseer-z-wave-products/smartstick/1483440-does-anyone-have-a-solid-working-g3-system-at-this-time/page4#post1510687


A workaround until this is fixed is migrating back to a 500 series stick, using the migration tool 700<->500 series. Description: https://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-997484466

darkbasic commented 2 years ago

Failure to heal the network or individual nodes, especially in busy situations

If any message gets sent in the network while healing it will fail. If the healed node is not in direct range from the controller, it will always fail. Right now I gave up coding anything zwave because of this, it makes it completely useless and unusable.I hope they can manage to find a solution soon, VERY soon.

justindthomas commented 2 years ago

I may walk back my comment on not reverting to the Z-Stick 5+; it seems like the problem with nodes being marked unavailable is much worse than I had thought.

I don't know if it's due to recent changes in the software or if I'm just now really noticing what is going on, but a few of my nodes just will not stay online for more than a few hours. Interestingly, it's a 700 series switch (Zooz Scene Controller) and 2 500 series Inovellis that are beyond that switch (probably routing through it) that are the most affected.

Pinging them from zwavejs2mqtt brings them right back online, but I have to be constantly on top of it to catch them go offline.

AlCalzone commented 2 years ago

Edit: zwavejs2mqtt 6.3.0 has built-in support for this now. Just restore a backup of the source stick onto the target stick and you're good to go.


If anyone wants to take a little risk and try the migration back to the 500 series (requires Node.js and npm to be installed):

  1. make an NVM backup of the current 700 series stick
  2. make an NVM backup of the target 500 series stick
  3. execute the convert command here: https://github.com/zwave-js/node-zwave-js/tree/master/packages/nvmedit#convert-one-nvm-to-be-compatible-with-another-one
  4. Restore the resulting NVM file on the target stick

❗❗❗ Disclaimer: I have quite a few unit tests ensuring the correct format but I haven't actually tested restoring the resulting files on a stick, so I'm not 100% certain if it will work (both the backup and the stick 😬). If it doesn't work, you should be able to hard-reset the target stick to get it working again, but I'm not making guarantees here. Try at your own risk!

justindthomas commented 2 years ago

@AlCalzone I gave that a try, but I can't write the resulting output back to the 500 stick. I get an error about it being the wrong size.

$ npx @zwave-js/nvmedit@8.9.0-beta.4-pr-3789-c208add convert --source ./NVM_2021-12-20_7.bin --target ./NVM_2021-12-20_5.bin --out ./NVM_converted.bin
npx: installed 69 in 4.726s
Converted NVM written to ./NVM_converted.bin

$ ls -la NVM*
-rw-r--r-- 1 justin justin 262144 Dec 19 17:15 NVM_2021-12-20_5.bin
-rw-r--r-- 1 justin justin  49152 Dec 19 17:15 NVM_2021-12-20_7.bin
-rw-rw-r-- 1 justin justin  14375 Dec 19 17:16 NVM_converted.bin

I'm not sure why the NVM backup for the 5 is so large - there are no nodes listed when I plug it in.

AlCalzone commented 2 years ago

Can this tool run in the home assistant container similar to the ZigBee migration tool? I remember you talking about this in the state of the home.

It is built in a way that the underlying functionality can also be used by the driver directly. So down the line, applications will just be able to call the method with the old NVM backup buffer.

However I think you said it was going to be released in the future and I assume this may have been rushed out as a quick fix but requires further development to become stable?

I've actually been working on it for some weeks and it only now got far enough to be released.

Is there a way to backup the 500 series stick in case the migration to the 700 series fails?

zwavejs2mqtt can do that. For now you need backups of both sticks anyways.

@justindthomas

I get an error about it being the wrong size.

Ahh, damn I forgot to change that. The conversion utility only outputs the part of the 500-series NVM that is relevant. They are 256 kB in size, but only the first ~14 kB are what interests us.

dearekaelle commented 2 years ago

@AlCalzone I think I was too quick in jumping into 7.17.0 firmware on my z-stick 7.. cannot convert the NVM

Error: Could not parse source NVM - invalid format! at migrateNVM (/root/.npm/_npx/0e291f38aba7805a/node_modules/@zwave-js/nvmedit/build/convert.js:746:19) at Object.handler (/root/.npm/_npx/0e291f38aba7805a/node_modules/@zwave-js/nvmedit/build/cli.js:168:45)

justindthomas commented 2 years ago

@AlCalzone looks like it's still just writing out the first 15k.

╭─justin@pop-os ~/node-zwave-js ‹nvmedit*› 
╰─$ npx @zwave-js/nvmedit@8.9.0-beta.4-pr-3789-f60a13a convert --source ./NVM_2021-12-20_7.bin --target ./NVM_2021-12-20_5.bin --out ./NVM_converted.bin
npx: installed 69 in 2.039s
Converted NVM written to ./NVM_converted.bin
╭─justin@pop-os ~/node-zwave-js ‹nvmedit*› 
╰─$ ls -la NVM*
-rw-r--r-- 1 justin justin 262144 Dec 19 17:15 NVM_2021-12-20_5.bin
-rw-r--r-- 1 justin justin  49152 Dec 19 17:15 NVM_2021-12-20_7.bin
-rw-rw-r-- 1 justin justin  14375 Dec 20 09:06 NVM_converted.bin
AlCalzone commented 2 years ago

Yeah but you can use that to restore the backup now.

justindthomas commented 2 years ago

I get the same error with the new file.

nvm

justindthomas commented 2 years ago

Same result - "The given data does not match the NVM size".

╭─justin@pop-os ~/node-zwave-js ‹nvmedit*› 
╰─$ npx @zwave-js/nvmedit@8.9.0-beta.4-pr-3789-97b47a4 convert --vv --source ./NVM_2021-12-20_7.bin --target ./NVM_2021-12-20_5.bin --out ./NVM_converted.bin
npx: installed 69 in 2.808s
Converted NVM written to ./NVM_converted.bin
╭─justin@pop-os ~/node-zwave-js ‹nvmedit*› 
╰─$ ls -la NVM*
-rw-r--r-- 1 justin justin 262144 Dec 20 11:26 NVM_2021-12-20_5.bin
-rw-r--r-- 1 justin justin  49152 Dec 19 17:15 NVM_2021-12-20_7.bin
-rw-rw-r-- 1 justin justin  14375 Dec 20 11:27 NVM_converted.bin

I also tried taking another backup of the 5+ (in case the first was corrupted or something) and deleting the NVM_converted.bin file before running the conversion, but no change in result.

justindthomas commented 2 years ago

Here are the log messages leading up to that, in case they're at all helpful.

2021-12-20 11:26:35.457 INFO ZWAVE: Success zwave api call backupNVMRaw {
  data: <Buffer 38 5f 2f d0 46 52 45 45 db c4 7b e8 ff ff ff ff 00 00 00 00 00 54 a5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 262094 more bytes>,
  fileName: 'NVM_2021-12-20'
}
2021-12-20 11:28:16.655 INFO ZWAVE: Calling api restoreNVMRaw with args: [
  <Buffer 38 26 2f d0 00 00 00 00 c9 5f d3 c4 ff ff ff ff 00 00 00 00 01 54 a5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 14325 more bytes>,
  [length]: 1
]
2021-12-20 11:28:16.713 INFO ZWAVE: The given data does not match the NVM size - cannot restore! (ZW0322) restoreNVMRaw undefined
AlCalzone commented 2 years ago

Ohh - your zwavejs2mqtt instance is still running the official version which doesn't include the fix. Not sure how you're running z2m. Since you obviously have Node installed, you can do this:

  1. clone https://github.com/zwave-js/zwavejs2mqtt
  2. in the cloned repo, run yarn up zwave-js@8.9.0-beta.4-pr-3789-aaa040d
  3. start with yarn dev:server
  4. open z2m in the browser (http://localhost:8091)
  5. restore there.
AlCalzone commented 2 years ago

@dearekaelle The latest test release should work for 7.17.0 too.

justindthomas commented 2 years ago

Got it. Yeah, I'm running the conversion on my laptop and attempting the restore on my Home Assistant server.

I'll try doing it all locally on my laptop per your instructions.

justindthomas commented 2 years ago

Sorry - I know this is remedial, but it doesn't look like there's a dev:server script:

yarn dev:server 
Usage Error: Couldn't find a script named "dev:server".

$ yarn run [--inspect] [--inspect-brk] [-T,--top-level] [-B,--binaries-only] <scriptName> ...

The yarn up command ran fine though.

AlCalzone commented 2 years ago

Uhh, in which directory are you doing that? In the cloned zwavejs2mqtt repo?

justindthomas commented 2 years ago

Oh, sorry - sheesh. I was using this repo node-zwave-js.

AlCalzone commented 2 years ago

I was using this repo node-zwave-js.

Make sure to revert the changes and run yarn there again, if you plan to work with that repo. Otherwise it doesn't matter.

dearekaelle commented 2 years ago

@dearekaelle The latest test release should work for 7.17.0 too.

Many thanks, it did work now to create output file. -rw-r--r-- 1 darkell staff 14375 Dec 20 21:53 NVM_out.bin

However restoring it, fails the same way as experienced by @justindthomas. Error while calling api restoreNVMRaw: The given data does not match the NVM size - cannot restore! (ZW0322)

I am currently running zwavejs2mqtt: 6.1.1; zwave-js: 8.9.0-beta.3

AlCalzone commented 2 years ago

Yeah, you need to make sure to run the test version of the driver to be able to update for now --> https://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-998231525.

justindthomas commented 2 years ago

It seems like it may have worked. The restore seemed to stick at 5%, but I think that might just be a miscalculation. The log messages seemed to indicate it worked and was complete.

When I plugged the 5+ in to HA in place of my Z-Stick 7, I see the 69 entities and they link up to the names and locations already in place. But the device data is all missing and they're spinning on "ProtocolInfo". As time passes, they're all reporting in as "dead". Do i just need to wait for that process to complete and then do manual interviews on all of them?

The light on the Aeotec 5+ is also off which seems odd. It was amber (charging) before the restore. But the HA server definitely sees that it's there.

AlCalzone commented 2 years ago

But the device data is all missing and they're spinning on "ProtocolInfo". As time passes, they're all reporting in as "dead"

I'd be interested in seeing a driver log of that startup. Since the devices are in memory, so should the home ID, but maybe that wasn't transferred correctly.

Edit: And please send me the original Gen5 backup too, I'll need to cross-check a few things.

AlCalzone commented 2 years ago

Ok, so there is a difference in RF config which might explain the connectivity problem.

AlCalzone commented 2 years ago

Last round for tonight, this time use 8.9.0-beta.4-pr-3789-6b3eaa1 for the version of the migration tool. Make sure to use the full backup as the target (like before), not the previous migrated one.

You should be able to restore using the versions you already have.

justindthomas commented 2 years ago

No luck. The behavior looks unchanged. Screenshot from 2021-12-20 16-11-18

2021-12-20 16:13:55.345 INFO ZWAVE: Connecting to /dev/ttyACM0
2021-12-20 16:13:55.350 INFO ZWAVE: Zwavejs usage statistics ENABLED
2021-12-20 16:13:55.351 INFO APP: POST /api/settings 200 38.685 ms - 1002
2021-12-20 16:13:55.499 INFO APP: GET /api/auth-enabled 304 0.779 ms - -
2021-12-20 16:13:58.736 INFO ZWAVE: Zwave driver is ready
2021-12-20 16:13:58.736 INFO ZWAVE: Controller status: Driver ready
2021-12-20 16:13:58.742 DEBUG ZWAVE: Binding to node 1 events
2021-12-20 16:13:58.742 DEBUG ZWAVE: Node 1 has been added to nodes array
2021-12-20 16:13:58.743 DEBUG ZWAVE: Binding to node 2 events
2021-12-20 16:13:58.743 DEBUG ZWAVE: Node 2 has been added to nodes array
<snip>
2021-12-20 16:13:58.761 INFO ZWAVE: Scanning network with homeid: 0x0
2021-12-20 16:13:58.762 INFO ZWAVE: Node 1: interview started
2021-12-20 16:13:58.778 INFO ZWAVE: Node 1: interview stage PROTOCOLINFO completed
2021-12-20 16:13:58.788 INFO ZWAVE: Node 1: interview stage OVERWRITECONFIG completed
2021-12-20 16:13:58.788 INFO ZWAVE: Node 1: interview stage COMPLETE completed
2021-12-20 16:13:58.790 INFO ZWAVE: Node 1 ready: AEON Labs - ZW090 (Z‐Stick Gen5 USB Controller)
2021-12-20 16:13:58.790 INFO ZWAVE: Node 1: interview COMPLETED, all values are updated
2021-12-20 16:13:58.791 INFO ZWAVE: Node 1 is alive
2021-12-20 16:13:58.791 INFO ZWAVE: Node 2: interview started
2021-12-20 16:13:58.793 INFO ZWAVE: Node 3: interview started
2021-12-20 16:13:58.794 INFO ZWAVE: Node 4: interview started
<snip>
2021-12-20 16:13:58.891 INFO ZWAVE: Node 2: interview stage PROTOCOLINFO completed
2021-12-20 16:13:58.903 INFO ZWAVE: Node 3: interview stage PROTOCOLINFO completed
2021-12-20 16:13:58.912 INFO ZWAVE: Node 4: interview stage PROTOCOLINFO completed
2021-12-20 16:13:58.921 INFO ZWAVE: Node 5: interview stage PROTOCOLINFO completed
<snip>
2021-12-20 16:14:05.434 INFO ZWAVE: Node 2 is dead
2021-12-20 16:14:11.986 INFO ZWAVE: Node 3 is dead
2021-12-20 16:14:18.865 INFO ZWAVE: Node 4 is dead
2021-12-20 16:14:24.672 INFO ZWAVE: Node 5 is dead
2021-12-20 16:14:31.997 INFO ZWAVE: Node 6 is dead
AlCalzone commented 2 years ago

Ok can you send me another copy of the original 500 series backup and the newly restored one? I think I might have mixed up the home IDs. The 500 series one has two in the NVM and it wasn't clear which is which.

dearekaelle commented 2 years ago

I have finally managed to restore my converted NVM on Zstick 5+. All the nodes appear to be dead same as for @justindthomas.

home id: 0; home hex: 0x0

I am sending you the 500 NVMs (original and restore) and the driver log.

AlCalzone commented 2 years ago

@dearekaelle @justindthomas I'd like to run a test with you to figure out which fields need to be changed for this to work. This consists of 4 steps:

  1. Convert the migrated NVM (the smaller one) to JSON using

    npx @zwave-js/nvmedit@8.9.0-beta.4-pr-3789-6b3eaa1 nvm2json --in /path/to/nvm --out /path/to/json
  2. Make a copy of the JSON and edit the copy as described below

  3. Convert it back to NVM

    npx @zwave-js/nvmedit@8.9.0-beta.4-pr-3789-6b3eaa1 json2nvm --in /path/to/edited/json --out /path/to/edited/nvm
  4. Restore the edited NVM to the stick and check if it works now.


Beginning with the NVM that was output by the tool, I'd like to test each the following changes individually (don't include the +/-). After each change, perform steps 3 and 4 from above:

  1. line 19:

    -       "controllerConfiguration": 60,
    +       "controllerConfiguration": 28,
  2. line 15:

    -       "nodeId": 1,
    +       "nodeId": 0,
  3. line 14:

    -       "learnedHomeId": null,
    +       "learnedHomeId": "0x12345678",

    (here copy the homeId from the field above)

  4. line 12:

    -       "applicationVersion": "7.17",
    +       "applicationVersion": "1.2",
dearekaelle commented 2 years ago

bingo! it was change number (2) alone. "nodeId": 0

all devices came straight to live :) automations work.

awesome, many thanks @AlCalzone

I will now monitor and will let you know if I bump into any post migration issues.

AlCalzone commented 2 years ago

I edited the original post above. 8.9.0-beta.4-pr-3789-5d3ea84 has the node ID change. If all goes well I can release it officially soon.

justindthomas commented 2 years ago

That looks like it did it! Very nice.

Interestingly, in the z2m on my laptop, the Gen5+ stick came up as a "700 series" controller. It did launch in to the interviews, so didn't seem like that impacted anything negatively.

When I moved it to my HA server, it came up accurately as a 500 series.

Thanks for your work on this, @AlCalzone!

AlCalzone commented 2 years ago

Interestingly, in the z2m on my laptop, the Gen5+ stick came up as a "700 series" controller

Interesting. May be a cache thing - got a driver log of the startup?

justindthomas commented 2 years ago

First time in months that I've been able to mostly complete a healing process without physically resetting the controller, so that's pretty great. :)

I didn't wait around for the battery-powered units to heal and there were 3 hardwired (out of 69 total) that failed for some reason, so it wasn't a complete success. But that's more normal in my experience. With the 700 unit, one early failure would cascade to cause the whole remaining process to fail. Definitely a step up.

AlCalzone commented 2 years ago

With the 700 unit, one early failure would cascade to cause the whole remaining process to fail. Definitely a step up.

That could also have been a result of the buggy beta.1.

I do - this seems like the relevant bit:

Thats not a driver log. One reason could be that the backup is only fully applied after a soft reset (restart) which might have been disabled for your stick.

sstarcher commented 2 years ago

@AlCalzone Do you have a recommended non Gen7 controller to downgrade to? I'm doubtful that the 700 series will be functional anytime soon.

AlCalzone commented 2 years ago

About half of our users have an Aeotec Gen5(+), which even supports SmartStart on the latest firmware. I don't have it myself but the reviews seem to be relatively positive.

guineau commented 2 years ago

I’ve been using the Nortek Z-Wave/Zigbee combo stick for a few years now and it has worked extremely well.

I updated its firmware last week and I think it now supports smart start. I don’t have any devices to test that with.

Sent from my iPhone

On Dec 27, 2021, at 8:18 AM, AlCalzone @.***> wrote:

 About half of our users have an Aeotec Gen5(+), which even supports SmartStart on the latest firmware. I don't have it myself but the reviews seem to be relatively positive.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

rcdailey commented 2 years ago

I personally am running a SiLabs 700 controller USB stick. Is it worth it to go to Aeotec Gen5? I haven't noticed the kinds of problems listed here, and I'm only at about 6 zwave devices (but growing over time). I do notice intermittent communication issues. Like when my HASS automations run, some lights do not switch on/off like they're supposed to. Not sure if that's related here.

A while back I did have a huge issue with signal strength / interference, mostly because I had my USB stick plugged into my Intel NUC which is inside of a metal 42U server rack. Moving it above the rack via a USB extender greatly reduced the issue. Again not sure if Gen5 will do anything for me, but I wanted to ask.

AlCalzone commented 2 years ago

I need to see driver logs capturing the issues to answer that. Ideally open another issue so we don't spam this one.

johanschelin commented 2 years ago

This is not here to spam, so feel free to erase the message - i just want to lift my hat up for you AlCalzone! You do a tremendous work with zwave-js - and I hope more of us start sponsring you :)

Daniel-dev22 commented 2 years ago

I’ve been using the Nortek Z-Wave/Zigbee combo stick for a few years now and it has worked extremely well. I updated its firmware last week and I think it now supports smart start. I don’t have any devices to test that with. Sent from my iPhone On Dec 27, 2021, at 8:18 AM, AlCalzone @.***> wrote:  About half of our users have an Aeotec Gen5(+), which even supports SmartStart on the latest firmware. I don't have it myself but the reviews seem to be relatively positive. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

How did you update your nortek?

darkbasic commented 2 years ago

I personally am running a SiLabs 700 controller USB stick. Is it worth it to go to Aeotec Gen5?

Aeotec Gen5 has much better reception than SiLabs 700, at least on EU frequencies. Also I suggest you to stay miles away from anything 700 series. That said, I don't think you're hitting this specific bug.

packetwarrior commented 2 years ago

@AlCalzone Has there been any update from Silicon Labs in regards to this issue? If there's anything we can do to help troubleshoot/escalate this issue, please don't hesitate to ask. Either way, thank you for all the time you've put into getting to the bottom of this!

fisch55 commented 2 years ago

Sorry for my question - but how can in find the actual firmware for aeotec z-7 Stick? On aeotec Site a can only find 7.15…. Maybe someone can help me🙈

bwosborne2 commented 2 years ago

how can in find the actual firmware for aeotec z-7 Stick?

Have you asked Aeotec support?

fisch55 commented 2 years ago

how can in find the actual firmware for aeotec z-7 Stick?

Have you asked Aeotec support?

No I thought, I can download it …. On aeotec site only 7.15 .

bwosborne2 commented 2 years ago

On aeotec site only 7.15

That is likely the latest version they released then.

darkbasic commented 2 years ago

What they released means nothing because they use firmware from SiLabs. Download SiLabs PC Controller and it will fetch latest firmware. Either way it won't help because latest version didn't fix the issue.

ronytomen commented 2 years ago

Putting my hat in the circle...

Using Zooz S2 (700 series) controller and encountering exactly what is described here. Currently have 37 devices in ZwaveJS and still have 36 devices to migrate over from old OZW setup.

I have disabled energy reporting on devices I can to reduce ZWave network traffic, but there are things like the whole house energy monitor that I can only reduce reporting by so much....

I'm willing to move back to a 500 series controller (I have a spare Aeotec laying around)... Just need to decide...

johanschelin commented 2 years ago

I will also try this as soon as a stable guide for transferring backup from 700 to 500 chip is available. Do you have anything yet AlCalzone?

Med vänlig hälsning, Johan Schelin


Från: Tony Roman @.> Skickat: Friday, December 31, 2021 3:48:15 PM Till: zwave-js/node-zwave-js @.> Kopia: Johan Schelin @.>; Comment @.> Ämne: Re: [zwave-js/node-zwave-js] 🚧 META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 (Issue #3906)

Putting my hat in the circle...

Using Zooz S2 (700 series) controller and encountering exactly what is described here. Currently have 37 devices in ZwaveJS and still have 36 devices to migrate over from old OZW setup.

I have disabled energy reporting on devices I can to reduce ZWave network traffic, but there are things like the whole house energy monitor that I can only reduce reporting by so much....

I'm willing to move back to a 500 series controller (I have a spare Aeotec laying around)... Just need to decide...

— Reply to this email directly, view it on GitHubhttps://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-1003393256, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APTUQJVINFOU5TXBEGGSSG3UTW7C7ANCNFSM5KD7ZG7Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>