home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
69.77k stars 28.91k forks source link

Controller reporting "Jammed" and then "Ready", operations hang #104230

Open ember1205 opened 7 months ago

ember1205 commented 7 months ago

The problem

I am commonly seeing my USB device (Aeotec Z-Stick 7 Plus) report being "Jammed" and then shortly after report being "Ready." Around this time, any operations will fail to occur and the system does not go back and "recover" from missed actions / automations.

What version of Home Assistant Core has the issue?

System Information version | core-2023.11.2 -- | -- installation_type | Home Assistant OS dev | false hassio | true docker | true user | root virtualenv | false python_version | 3.11.6 os_name | Linux os_version | 6.1.59 arch | x86_64 timezone | America/New_York config_dir | /config
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.1 -- | -- update_channel | stable supervisor_version | supervisor-2023.11.3 agent_version | 1.6.0 docker_version | 24.0.6 disk_total | 228.5 GB disk_used | 6.1 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Z-Wave JS (0.3.0), Let's Encrypt (4.12.9), Z-Wave JS UI (3.0.2)
Dashboards dashboards | 2 -- | -- resources | 0 views | 1 mode | storage
Recorder oldest_recorder_run | November 10, 2023 at 7:57 PM -- | -- current_recorder_run | November 16, 2023 at 3:32 PM estimated_db_size | 60.77 MiB database_engine | sqlite database_version | 3.41.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zwave-js

Link to integration documentation on our website

No response

Diagnostics information

zwave_js-6d4b40be1e77c3e33d78bdf24d66a9c3-USB Controller-d288fb52eab027710676687815136717.json (1).txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

I can provide debug logs if useful for zwave-js: Please let me know if you want an entire day's worth of info or if you would prefer a certain amount of time leading up to and then after an event.

home-assistant[bot] commented 7 months ago

Hey there @home-assistant/z-wave, mind taking a look at this issue as it has been labeled with an integration (zwave_js) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zwave_js` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zwave_js` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zwave_js documentation zwave_js source (message by IssueLinks)

markus99 commented 7 months ago

Very similar issues here, ever since ZWave JS upgraded to 12.x.x it's been doing this. Absolutely maddening. Whole point of ZWave is to turn on / off / set levels of devices - now it just hangs and hangs and hangs.

Was using an Aeotec Gen5+, upgraded to Zooz 800 series stick.

image

ZWaveJSLog.txt

zwave_js-6d4b40be1e77c3e33d78bdf24d66a9c3-USB.Controller-d288fb52eab027710676687815136717.json.1.txt

ember1205 commented 7 months ago

Very similar issues here, ever since ZWave JS upgraded to 12.x.x it's been doing this. Absolutely maddening. Whole point of ZWave is to turn on / off / set levels of devices - now it just hangs and hangs and hangs.

Was using an Aeotec Gen5+, upgraded to Zooz 800 series stick.

image

ZWaveJSLog.txt

zwave_js-6d4b40be1e77c3e33d78bdf24d66a9c3-USB.Controller-d288fb52eab027710676687815136717.json.1.txt

A few points of note that may, or may not, be of value...

Switching between 800LR and 500/700 series controllers seems to require a rebuild. Last I checked (somewhat recently), there wasn't a reliable way to create a backup of an 800LR controller let alone restore it even to the same controller. And there is apparently some compatability issues between the 800 and older systems where the data can't be migrated.

My 700-based device generally works ok. I have read decent results from the 800LR community as well.

I was able to significantly reduce the occurrence of the "Jammed" situation by basically turning off / turning down the frequency of information reports from the various endpoints to sort of 'bare minimums.' I have found that many devices seem to come factory-configured to be super chatty and send all kinds of info all of the time and had modified a fair number of these in my setup already. But, a few days ago, I went through the WHOLE system and modified absolutely every device that I could to turn off automatic "periodic" reports, changed reports to Basic where I still needed reporting, and removed certain data points that I didn't need from various devices (like light levels from a sensor that I'm only using to collect humidity levels from ). I would estimate that I further cut my network traffic by at least half due to the significant reduction in log file size for a 24 hour period after the changes.

The "Jammed" status still appears, but it takes significantly longer to do so now and there does still appear to be the possibility of a relation between trying to communicate with devices that are "too far away" for smooth communications and the situation occurring.

When my device reports "Jammed", it almost immediately reports "Ready" shortly after (within a few seconds). My network is about 40 devices spread across three floors of my house (basement plus two living space stories above) with the controller being on the second floor.

markus99 commented 7 months ago

This was working, from everything I've seen, BEFORE the 12.x ZWave upgrade and now hasn't for almost 2 months. It's ruined my HA experience. My old Aeotec Gen5+ was running pretty solid, now it's trash. Upgraded to Zooz 800, same issues. Now trying to wait for Hubitat to ship their device to me to go back to running a 3rd party hub (was on SmartThings prior to trying [and failing miserably] to run Zwave/Zigbee [SkyConnect] locally). What a disaster - and everyone's pointing to Silicon Labs driver being the issue...

ember1205 commented 7 months ago

This was working, from everything I've seen, BEFORE the 12.x ZWave upgrade and now hasn't for almost 2 months. It's ruined my HA experience. My old Aeotec Gen5+ was running pretty solid, now it's trash. Upgraded to Zooz 800, same issues. Now trying to wait for Hubitat to ship their device to me to go back to running a 3rd party hub (was on SmartThings prior to trying [and failing miserably] to run Zwave/Zigbee [SkyConnect] locally). What a disaster - and everyone's pointing to Silicon Labs driver being the issue...

Your comments are self-conflicting. You can't say "everyone's pointing to Silicon Labs driver being the issue" (implying you think that's inaccurate / untrue) while also stating both of your ZWave controllers are "trash" and that you "failed miserably" with SmartThings. It's all the same controller chips, designed and manufcatured by SL, that you're ultimately hitting issues with.

Fix your devices' configurations, regardless of the platform you tie them into, or you're going to continue to have problems.

My setup still encounters the "Jammed" error but the interval now is down to once every 24-36 hours. And the system gets itself back into an operational state fairly quickly with the Soft Reset option.

The SDK needs to be validated or fixed as does the driver built from it. Once that's done, additional steps can be taken by those that develop software like node-js and node-js-ui to ensure they have the updates in place as well. Also, this will need to be addressed by the hardware vendors to ensure the firmware for their controllers gets updated as necessary.

markus99 commented 7 months ago

I have ~25 devices, only 3 of which are sensors - the rest are switches (typically Jasco/GE [about 1/3 dimmer, 2/3 toggle only]). I reduced / updated all 3 sensors to the bare minimum of reporting - to the best I was able / knowledgeable to do.

I have not added any add'l endpoints in the past year and the system, when on SmartThings was rock solid - as it was 99.9% of the time before October of this year.

From what I've seen on HA Community forums, and here on github - I'm not the only one having issues - and the firmware of my USB stick (the original Aetoc Gen5+) hadn't been updated in well over year - ceteris paribus, seems to me it's the ZWave JS 11.x -> 12.x update(s).

ember1205 commented 7 months ago

Are you all ZWave? I'm having a hard time following the bouncing ball with some of your comments... You stated ST was "rock solid" but also said you "failed miserably" with it when trying to get ZWave running locally. I'm not sure what to really make of this pair of statements.

For ZWave only, these comments may be helpful:

The GE/Jasco devices tend to be less configurable that other similar devices. They also tend to be less chatty on the network overall which should be a -good- thing.

If your sensors are ALSO running ZWave, you will want to look at things like the wakeup interval, whether they are sending automatic reports, what kinds of reports, and on what intervals. Example: I have an Aeotec door/window sensor installed on a bedroom closet. When you open the doors, the light goes on, close the doors, light goes off. The sensor is configured with the following details:

Report When Open Low Battery Threshold 20% Low Battery Check enabled Low Battery Check interval 86640 (that's just slightly less than once per day) Motion Sensor Triggered Command: Basic CC Report Wakeup Interval (settable within ZWave JS UI only): 3600 (once per hour)

That's it. If the door moves, it sends a notification that it was opened/closed, checks the battery level once per day, wakes up to check in with the controller once per hour.

If you hadn't updated the firmware on your Gen5+ controller stick to anything higher than 7.18.x, then the driver issue isn't at all related to the issues you saw with it. You need to be debug logging your setup and looking through the logs for chatty nodes and dropped packets that are occuring leading up to the Jammed state. In my case, I have a couple of devices in my basement that are having a tougher time communicating with the controller which is on the second floor. Dropped packets from a sensor in the basement was causing communications issues and 'upsetting' the controller. Drastic changes to its configuration and what/when it reports has helped significantly and was my only option at the time since I couldn't move the controller closer.

Looking at the node map in ZWave JS UI and understanding where there are poor quality routes helped me to identify nodes that needed some adjustments that were more drastic than others. It also reminded me of "dead" nodes that I had unplugged that the controller was still trying to communicate with periodically... I removed them from the network to assist in cleaning up the amount of traffic.

As I've said... I still get the error - there is still a problem. But the frequency has gone down by orders of magnitude and it self-corrects much more reliably. My setup is functional "most of the time" again and should only get better once this issue is corrected.

markus99 commented 7 months ago

Started on using Zwave in '15/'16 via SmartThings / IFTTT / Stringify. When the latter shut down I moved over to HA in ~ 2019 - integrating Zwave devices via SmartThings. When Samsung bought SmartThings and starting screwing with it I decided to move to HA 'local' Zwave control with Aeotec stick in late 2020 I think it was. No issues throughout the entirety of those changes (other than Jasco / GE switches getting the flashing blue light of death issue).

Anyhow, send a command, switch does what you want - flip switch manually, HA updates accordingly / quickly, etc. I run very few sensors (Aeotec MultiSensor 6's, and all hard wired to power and a single Zooz motion), the rest are light switches or outlet-type plug-in on/off devices.

9/26/23 is the node-zwave-js 12.0.0 release (https://github.com/zwave-js/node-zwave-js/commit/51aa4ba0b5f70edf86a070eb5c0d9bd6f84d2573) - and that's when it all hit the fan.

Thought my Aeotec Gen5+ stick might have been the issue (being 3-4 years old), so I tried upgrading (good times) to the Zooz 800 series. Same issues. Controller jammed. Commands not affecting switches. Lights on when should be off, vice-versa. Constantly.

Wrote a bunch of scripts to 'retry' turn on / turn off with do / while loops. Updated countless automations to use them vs. 'regular' switch.turn_on or light.turn_off commands. They help, but not always.

I've tried 'streamlining' my network for the 3 sensors I do have and the like (thank you for these suggestions), no changes in stability.

Keep in mind, for ~4 years+, all has been fine and I've only added 3-4 new switches over that time - nothing major.

I do run a Zigbee (SkyConnect USB) network as well in the home and that's been ok. But Zwave, until the 12.0.0 release was rock solid, no issues, like ever. Now, it's awful. Click an entity from a glance card in Lovelace and >50% off the time it fails - quicker now to just get up and hit the thing manually. Balks thru Google Home as well. #failarmy

The old Aeotec Gen5+ hadn't had its firmware updated in sometime, actually just checked and it was on:

image

So the old USB stick / old firmware had issues, the new stick w/ new firmware (7.19.3) has issues and the only change is this node-zwave-js -> 12.x.x change.

Seems to me (as it shows in other threads as well) that this is the culprit. I'm no expert, but it's what my experience shows everything pointing towards.

None of this frustration is aimed at anyone, merely the situation. I've spent countless hours, have zero wife-approval factor, and now hundreds of $$s trying to fix the problem - all to be told to wait for Silicon Labs to update something - when (and I realize most ppl work for free on HA {yes, I do support and do subscribe to their annual subscription}) to me it's not even an SI firmware issue. Regardless, appreciate everyone's help here, but it's still over-the-moon frustrating regardless.

ember1205 commented 7 months ago

I have some of those Aeotec Smart Sensor 6 devices... they can be SUPER chatty if you aren't careful with the configuration. They can track a lot of different types of variables, but I only use one to track humidity. I have the automatic reporting disabled and I'm -only- sending selective reports using the Humidity Thresholds to notify when it's above a certain level (to turn on the smart plug for the dehumidifier) and when it's below a certain level (to turn that plug off).

I saw what appeared to be a relatively functional setup until actually BEFORE the release you mentioned. My issues started somewhere more around the 9/10-ish range and actually settled back down around 9/14 or so. I can't say definitively whether I was seeing the "Jammed" reports or not because I hadn't yet discovered that piece of information, but something was definitely misbehaving.

One of the ways I knew something was up was due to how the two plugs that operate my coffee makers were operating... I have a drip maker and a single cup maker plugged into a smart plug each and then connected to the same household circuit. This is bad in the sense that trying to operate both at once will overload the circuit (15A) since each one can easily draw 12A+ when heating. The plugs monitor the power use and a high draw from the drip maker powers off the single cup brewer to prevent circuit overloads.

Any time I have had issues, the single cup brewer's plug clicks almost continuously for maybe 30 seconds or so. Sometimes it repeats after a 10-20 second "break". I believed I potentially had a bad plug and swapped it out. Problem persisted. Each time I felt like I made progress with settings and such, that clicking was a dead giveaway that something was still off. I haven't heard that happen now in quite a few days.

There's some discussion about possible frequency interference between the USB controller / bus and the ZWave radio, but I'm not convinced. First, the information seems to be related explicitly to USB3.0 which operates at 2.5GHz. This WILL cause some crosstalk and interference with Zigbee, but not ZWave since it's 903MHz.

USB2.0 is a little different since it operates at 240Mhz which at 4x frequency would be 960MHz. This could be sufficiently close enough to the 903 to generate some interference, but ONLY if it were to actually oscillate at that actual multiplied frequency.

In short, if your compute device has USB3.0 ports, and/or you have a USB3.0 SSD running it, you could be getting some odd behavior with the Zigbee. Moving that well away with an extension cable with shielding may assist there. And, if the Zigbee is being interfered with and its clogging the communication bus to where ZWave stuff can't go out or come in, then there could be a domino effect there. This is all speculation, though... Your debug logs may be helpful - look for activity leading up to a Jammed event to see what's actually happening in those last 5-10 seconds beforehand.

otterlo commented 5 months ago

I am looking at above discussion but i dont think it is a setup problem. My zwave setup with the 700 series is working excellent for months in a row until somewhere in december time i experience the same as above. Jammed receiver. Can not switch any module on or off. I havent changed anything. Only thing i do is the normal HA and Zwave js upgrades.. these errors must be related to the recent changew from HA upgrades. I was never thinking to leave zwave but options are running out now

Polosaz commented 5 months ago

Thanks to this thread, I realized that my problem (reported here: https://github.com/home-assistant/core/issues/106827 ) is also this one.

I hope there is a solution soon because it has dismantled the proper functioning of my home automation.

imagen

ember1205 commented 5 months ago

The biggest reason that this problem isn't getting resolved is because no one will acknowledge it exists. Everywhere it has been reported, the code owners for that piece are, in essence, claiming "my code is fine" and the project piece owners are not working together to understand the underlying issue.

The complete lack of any ability to even understand how to track this issue down to the core component will likely result in me leaving HA and my project is barely six months old. It has taught me a lot about ZWave, but my learnings are likely going to simply be put to use elsewhere because no one is willing to own this problem.

markus99 commented 5 months ago

@raman325 - I commented, as have numerous others on this issue - which I believe to be the same as https://github.com/home-assistant/core/issues/106827. Any thoughts here?

Tagging @MartinHjelmare and @AlCalzone as people who've had commits in the last 3-4 months as well. Appreciate the help here all.

MartinHjelmare commented 5 months ago

Please don't tag people unless they've asked you to do that.

The integration can't do anything about a jammed controller. That problem is on the device or driver side. The driver project is another issue tracker.

I recommend reading this troubleshooting section for how to improve your network health.

https://zwave-js.github.io/node-zwave-js/#/troubleshooting/network-health

AlCalzone commented 5 months ago

This is a firmware bug. The driver (node-zwave-js) tries it's best to work around it. Before that workaround was added you'd randomly have nodes marked as dead, even if they weren't.

It may be fixed in the firmware based on SDK 7.21.0, but Silicon Labs are not 100% certain about it. It seems like this isn't fixed yet, as of SDK 7.21.0

ember1205 commented 5 months ago

This is a firmware bug. The driver (node-zwave-js) tries it's best to work around it. It may be fixed in the firmware based on SDK 7.21.0, but Silicon Labs are not 100% certain about it.

Is there any information provided by SL that identifies it as a firmware bug? If you have any links to anything that has been published by SL, it would be interesting to read.

7.21 is not addressing the issue as some folks are running firmware based on that SDK and the issue persists.

ember1205 commented 5 months ago

Please don't tag people unless they've asked you to do that.

The integration can't do anything about a jammed controller. That problem is on the device or driver side. The driver project is another issue tracker.

I recommend reading this troubleshooting section for how to improve your network health.

https://zwave-js.github.io/node-zwave-js/#/troubleshooting/network-health

Where can we learn more about the specific issue with the device or the driver? This is affecting MANY people with various chipsets (500, 700, 800, etc.) across devices from many manufacturers, using various versions of the node-zwave-js software. And if it is a problem with a device, driver, or even firmware, how will addressing details for network health remove the error?

The point here is that this issue just keeps getting deflected from one section of the code to another, people are being told "it isn't our issue", and no one has any actual details about what's really going on or how to actually dig into the weeds to find it. The easy response is to blame SL, firmware, the SDK, and the manufacturers of the devices. But there's no actual evidence to support any of that that we have seen.

AlCalzone commented 5 months ago

7.21 is not addressing the issue as some folks are running firmware based on that SDK and the issue persists.

Do we have driver logs of this?

And as for the evidence, we're in direct contact with Silicon Labs and they have confirmed the issue is on their side.

ember1205 commented 5 months ago

I don't have anything, but have chatted with others running 7.21 that indicate the "Jammed" status still occurs. I have stopped any additional time/effort troubleshooting for this and have my controller remaining at 7.17 until there's a known fix. I've also discontinued any updates to HA core or the components until there are fixes that have been tested and verified.

My personal next step -might- be to revert to the last known good working instance of all things HA prior to the early/mid September release that so many have pointed to as being the point when this showed up.

AlCalzone commented 5 months ago

Do you happen to know which addon/driver version the affected users are running? Can you send them here to provide driver logs?

ember1205 commented 5 months ago

Not only do I now know what software pieces they are running, I couldn't tell you what I'm running. I can't differentiate between node-zwave-js, ZwaveJS, ZwaveJS UI, etc. Honestly, it would be immensely helpful to add a button to the console to dump certain core data pieces like that in simple terms that are not only useful to the developers but would be something that the users could understand as well.

Polosaz commented 5 months ago

Do you happen to know which addon/driver version the affected users are running? Can you send them here to provide driver logs?

I run 7.19.2 Aeotec Firmware, 0.4.3 Zwave JS addon, HA Core 2024.1.1, HA Operating System 11.3.

You have my log in https://github.com/home-assistant/core/issues/106827

otterlo commented 5 months ago

i am pleased to see that the discussion is getting continued on this topic, though not sure if the fix will come. for me Z-Wave will remain one of quite important home automations protocols and invested a lot in the devices and HA, and when HA took up the development to integrate in HA it was very good news for me, i could avoid having 3rd party zwave Hub.

it seems we finally struggle based on silicon labs drivers / firmware, which i hope we can overcome soon. thanks to all contributing to find a solution, much appreciated

kpine commented 5 months ago

Not only do I now know what software pieces they are running

Then could you provide a link to wherever these users are talking about 7.21.0 behavior?

I couldn't tell you what I'm running. I can't differentiate between node-zwave-js, ZwaveJS, ZwaveJS UI, etc.

The Z-Wave integration configuration panel will tell you the versions:

image

The URL will give an indication of what you've installed, but you should be able to tell based on how you installed it.

You can also download the integration diagnostics and device diagnostics, which will report which versions are being used (and other significant info used for troubleshooting).

image

Honestly, it would be immensely helpful to add a button to the console to dump certain core data pieces like that in simple terms that are not only useful to the developers but would be something that the users could understand as well.

Could you clarify what "the console" is? Do the diagnostics above fulfill this request, or are you looking for something else? The diagnostics are mostly for developers though, not sure what kind of "dump" information would be accessible for users.

As for debug logs, you can enable these from HA with one click in the integration panel. The instructions are listed in the in the integration documentation.

image

Or, if you are using Z-Wave JS UI, you can get the driver logs directly from it, see their documentation: https://zwave-js.github.io/zwave-js-ui/#/troubleshooting/generating-logs?id=driver-logs

Or, if you are using the official core add-on, you can enable logging to files in the settings and grab the log files later (default 7 days saved) using the File editor add-on. See instructions in the add-on documentation.

The last two options are best, IMO, for driver troubleshooting. The integration debug logs add a lot of extra noise, but is still better than nothing, and easiest to obtain.

kpine commented 5 months ago

I run 7.19.2 Aeotec Firmware, 0.4.3 Zwave JS addon, HA Core 2024.1.1, HA Operating System 11.3.

I assume the request was to get new data from someone running the new 7.21.0 firmware. 7.19.2 is already known to have the issue. You can request the 7.21.0 firmware from Aeotec tech support if you want to give it a try.

ember1205 commented 5 months ago

@kpine - While I appreciate the details you provided, they are way beyond what is "necessary" for a discussion like this. By "console", I generally mean somewhere in the admin UI. When someone asks "what integration are you running", there should be a single displayed piece of data somewhere that any user can click on to show main components that are in operation. Having to go to six different pages in the admin UI to find 9 different pieces of data is painful and causes these conversations to derail quickly because the information is just way too hard to locate.

I've been involved with all kinds of computer technology for thirty years - it's core to my career. I have a moderately complex home network with a home lab that I use as part of my job. I've run ZWave devices in my house for close to a decade. HA just isn't intuitive or simple when it comes to being able to find basic information like "what driver are you running?" From what I have seen, the software components use different version numbers to indicate the same pieces of software which makes it that much more confusing.

kpine commented 5 months ago

By "console", I generally mean somewhere in the admin UI.

OK thanks for the clarification. The screen shots I posted can all be navigated to from "Settings" page, which is what I would personally call the "admin UI".

When someone asks "what integration are you running", there should be a single displayed piece of data somewhere that any user can click on to show main components that are in operation.

You can find this under Settings (admin UI) -> Devices & services. This page defaults to showing all of the installed integrations. You will see Z-Wave listed. It only takes a few clicks to get there. One extra click for "Configuration" gets you to details about the Z-Wave integration including the component versions. To clarify, since we are talking about Z-Wave JS, the only integration relevant to HA is the Z-Wave integration.

causes these conversations to derail quickly because the information is just way too hard to locate.

Well, hopefully the information provided clarifies how to locate the information. I feel it is pretty easy to access, but I am likely biased. Improvements are welcome, of course. If my instructions were not helpful to you, hopefully they are for anyone else stumbling upon this issue who is thinking of testing firmware 7.21.0. :crossed_fingers: My apologies if I have derailed the conversation.

From what I have seen, the software components use different version numbers to indicate the same pieces of software which makes it that much more confusing.

Agreed, it's can be confusing for the average user, unfortunately this is a reality of a tech stack that involves multiple independent components. The component versioning can't be perfectly aligned, as they are independently developed projects with different release cadences. A handy website to clarify the version confusion is https://zwave-js.github.io/which-version/.

otterlo commented 5 months ago

For what it is worth: based on above post i checked my firmware version and decided to uograde from 7.19 to 7.20.2 I used the All frequencies version and suprisingly evrything works OK now

No jammed controller and all devices respond again except for some battery operated doorswitches but maybe they first have to wake up properly.

So far i am 2 hours without problems.. not sure if it woll remain though..

ember1205 commented 5 months ago

For what it is worth: based on above post i checked my firmware version and decided to uograde from 7.19 to 7.20.2 I used the All frequencies version and suprisingly evrything works OK now

No jammed controller and all devices respond again except for some battery operated doorswitches but maybe they first have to wake up properly.

So far i am 2 hours without problems.. not sure if it woll remain though..

I can easily get 24-36 hours in between "Jammed" events. I suspect you will absolutely see the error return "soon."

AlCalzone commented 5 months ago

AFAIK 7.20.2 doesn't contain the fix.

Polosaz commented 5 months ago

I cant find the 7.21.0 firmware file.

ember1205 commented 5 months ago

I cant find the 7.21.0 firmware file.

Contact Aeotec

tony-park commented 5 months ago

Hi all,

I've been in contact with Chris from Aeotec support, and he's provided the 7.21.0 firmware for the 700 series usb controller. The jammed issue is still happening, not very frequently currently, but that's probably down to reducing the number of devices and te messages they are passing across the network now.

I've enabled the debug logging as detailed earlier, and will try and upload log files this time tomorrow, assuming that I'm still being the jammed message.

Personally, prior to reducing the number of messages being passed, I was seeing the jammed message maybe 3-4 times per hour, and it would then take approx 4-5 seconds to change back to ready again.

If there's anything else I can help with, please let me know, very eager to help get this resolved!

AlCalzone commented 5 months ago

If there's anything else I can help with, please let me know, very eager to help get this resolved!

Please share driver logs of the issue happening, as requested earlier.

tony-park commented 5 months ago

zwave_2024-01-10.zip

Please find attached log file from today.

image

If anything else is required, please let me know.

Polosaz commented 5 months ago

This morning I updated to version 7.21.0, but I'm still having the same problem with the Zwave receiver.

captura home-assistant_zwave_js_2024-01-10T15-59-15.792Z.log home-assistant_zwave_js_2024-01-10T15-59-15.792Z.log

otterlo commented 5 months ago

not sure if this info helps, but the other day i upgraded firmware and i got the system back working again i optimised to reduce zwave channel data sending especially for energy reporting devices and monitored, there is not much traffic ongoing and i hardly got jammed errors reported, until today where i noticed that when i switch the nodes that are most far away from the zwave dongle will cause the zwave to get jammed uppon switching,

not on the nodes closer to the dongle

AlCalzone commented 5 months ago

@tony-park I'll check with Silabs. The log is pretty clear that this still happens.

russellmoran99 commented 5 months ago

@AlCalzone

I'm having the same issues. My Z-Waze devices are unresponsive today. I have rebooted my pi4 and moved my ZST10-700 by Zooz (Firmware: 7.19.2) dongle. I have a zigbee sonoff that works great. They're both plugged into USB2.0 on my pi4 and I have them about 18 inches apart from each other on extenders. I have noticed no problem with the zibgee network, just the Z-Wave. I have the following logs.

config_entry-zwave_js-814a55062f0822e257176648de836ed0.json (1).txt zwave_js (2).log zwave_js-814a55062f0822e257176648de836ed0-700 Series USB Controller-a925a70f23189b5ccc8685a4eca26416.json (1).txt

I have contacted Aeotek support email and hoping for a 7.21.0 firmware. I am reading thru the improving my network health, but I am not sure how to make changes to any devices as they are all "unavailable". I have 58 z-wave devices and a combination of plugs and light switches primarily. I do not see any places for me to change settings or adjustments to them. Any help appreciated as all my light switches are Jasco Z-Wave and this effects 75-80% of my automations. Thanks for any help.

UPDATE: I have decided to downgrade js to 4.0 (first one I had installed 2 months ago when I started my HA path). I went from 4.4, to 4.3, 4.2, 4.1 and now 4.0. I rebooted my pi4 after every downgrade. Still having the same issue. :(

ZST10-700 Admin UI

AlCalzone commented 5 months ago

@russellmoran99 in older versions, Z-Wave JS didn't react at all to a jammed controller, and instead would randomly mark devices as dead. If this behavior is preferred, you can revert to it by adding the following line to your Z-Wave addon configuration (switch to YAML mode):

disable_controller_recovery: true
PeteRager commented 5 months ago

Z-Wave JS didn't react at all to a jammed controller, and instead would randomly mark devices as dead. If this behavior is preferred, you can revert to it by adding the following line to your Z-Wave addon configuration (switch to YAML mode):

Thanks @AlCalzone, I'm starting to understand this better. Is this a fair statement. In the SW prior to this enhancement nodes would get randomly marked dead and that is why there is a proliferation of threads on this forum to create automations that detect dead devices and ping them to get them back to life. And the underlying problem had northing to do with the node rather it is a controller issue that was being masked.

It seems like there could be at least many different causes of this which if we could differentiate in the SW could be helpful, or even partially differentiate.

a) Faults in the VM USB pass thru layer (you have some notes posted on this); or faults on the docker pass thru layer (though I don't think this exists as docker just uses security credentials to control access?) b) Faults in the Host USB driver c) Faults in the Host USB interface (including low/marginal voltage to stick d) Stick RF interference due to poor placement in relation to the computer, WIFI or other high frequency electronic devices e) hardware issue with the stick f) firmware issue with the stick

russellmoran99 commented 5 months ago

@russellmoran99 in older versions, Z-Wave JS didn't react at all to a jammed controller, and instead would randomly mark devices as dead. If this behavior is preferred, you can revert to it by adding the following line to your Z-Wave addon configuration (switch to YAML mode):

disable_controller_recovery: true

At this point, getting a working Zwave network is all I’m trying to accomplish. I have ordered a usb2.0 powered hub I’m going to add to my pi4 and then add the Zoom dongle to it. I’m going back to the 4.3 JS and then changing to JS-UI to see if I can change more options with chatter on my devices. Did you happen to look at my log files from the previous post? Anything you see I should concentrate on?

Here’s an updated one and it says node 33 is causing my controller to become unresponsive. I’m going to remove that node from my system and then see if that fixes it. If it does not, I will go to each Zwave switch and physically pull the pin on them to disable them (Jasco Dimmers/Relays) until I can find the one(s) causing too much chatter. Does this seem like a reasonable approach to solving the problem? Thanks

IMG_3057

russellmoran99 commented 5 months ago

Here’s an update from ZOOZ. They told me Silicon Labs is aware of this problem. Does anyone have a contact directly with them to verify and expedite this?

————- The firmware updates for our Z-Wave sticks are developed and provided by Silicon Labs, so any functionality issues would need to be addressed on their end. Once new firmware is available, we'll receive a file specific for our hardware and have it available with all of our other firmware.Silicon Labs has confirmed the current issues, and they have confirmed an updated firmware should be available in the next 3-4 weeks to resolve the issue. We have added you to the waitlist, and we will reach out as soon as the new firmware is released.

ember1205 commented 5 months ago

@russellmoran99 in older versions, Z-Wave JS didn't react at all to a jammed controller, and instead would randomly mark devices as dead. If this behavior is preferred, you can revert to it by adding the following line to your Z-Wave addon configuration (switch to YAML mode):

disable_controller_recovery: true

At this point, getting a working Zwave network is all I’m trying to accomplish. I have ordered a usb2.0 powered hub I’m going to add to my pi4 and then add the Zoom dongle to it. I’m going back to the 4.3 JS and then changing to JS-UI to see if I can change more options with chatter on my devices. Did you happen to look at my log files from the previous post? Anything you see I should concentrate on?

Here’s an updated one and it says node 33 is causing my controller to become unresponsive. I’m going to remove that node from my system and then see if that fixes it. If it does not, I will go to each Zwave switch and physically pull the pin on them to disable them (Jasco Dimmers/Relays) until I can find the one(s) causing too much chatter. Does this seem like a reasonable approach to solving the problem? Thanks

IMG_3057

Powered hub, extension cable, different firmware, different zwave stick... none of it makes a difference. I've read/heard that SI is 'aware' of the issue after having read that the 7.19 SDK was 'possibly' the underlying culprit. It clearly isn't a 7.19 thing because my controller has exhibited issues with 7.17, 7.18, and 7.19 - all starting in the September timeframe after a software update to HA. Others have tried 7.20 and I believe even 7.21 and the issue persists. The commentary is that this has always been an issue but manifested differently before the September update, but I -never- saw my controller go "dead".

Further, with other hubs, I would see (every now and then) a significant lag in response to a command, but it did always eventually go through. With HA, commands seem to be silently dropped / ignored when there is a jammed condition.

AlCalzone commented 5 months ago

I would see (every now and then) a significant lag in response to a command, but it did always eventually go through

I've tried repeating the transmission until it finally goes through but for some users this was causing an infinite loop when the controller never got out of this "jammed" state. So letting the commands fail seemed to be the lesser evil.

As for the firmware versions that are broken, I'm not sure where it started and if there are any non-broken 7.x firmware versions. 7.19 seem to be the worst though.

ember1205 commented 5 months ago

I would see (every now and then) a significant lag in response to a command, but it did always eventually go through

I've tried repeating the transmission until it finally goes through but for some users this was causing an infinite loop when the controller never got out of this "jammed" state. So letting the commands fail seemed to be the lesser evil.

As for the firmware versions that are broken, I'm not sure where it started and if there are any non-broken 7.x firmware versions. 7.19 seem to be the worst though.

Understood. However, the same 500/700 series radios or chipsets are present in other full-built hubs and these issues don't seem to be present there. As I mentioned... I would very occasionally see a lag, but the command would eventually execute. The 'worst' I ever experienced was after days and days and days of running, with many nodes being overly chatty for no reason, the system would stop being able to control certain nodes and I would have to restart. That network was larger than what I run with HA, and since moving to HA I have updated the configuration of my nodes to effectively turn off all unnecessary communications originating at the nodes themselves - almost all communications are now the result of the HA starting the comms out to the node(s). Another oddity I have noticed with HA is that using a script to control multiple devices in one pass (I had 6 or 7 controllable plugs for my holiday lights that I wanted all powered on and off together) would sometimes result in the script silently failing at a particular step which would result in only certain devices being controlled. For example, having it turn six plugs, one at a time, might result in the first three executing, some sort of failure, and nothing occurring past that point.

When the problem first started getting reported (the jammed state), there were a couple of different posts or threads that pointedly called out a "potential" bug in 7.19's SDK which also led to a warning within HA to not update to a firmware at that level. Interestingly, I -never- encountered an issue like this prior to that September HA update and I (among others) have reported the same. We have also all reported that -all- firmware from 7.17 and up exhibit the issue. For the 700 series sticks, it seems that you need to be running 7.17 or higher or it won't work at all within HA.

There seems to be zero public-facing information directly from SiL about what may be occurring within the SDK/firmware. And the various recommendations about using an extension cable and/or powered hub seem to be backed only by anecdotal evidence and the recommendation to use an extension cable is "general" and made to everyone instead of being focused ONLY to those that are using systems with a USB3 chipset in them (which is where the limited info about the possible interference seems to be tied).

kpine commented 5 months ago

There seems to be zero public-facing information directly from SiL about what may be occurring within the SDK/firmware.

There an acknowledgement of an issue in this comment, which is one degree away.

There's also a bug mentioned in the latest public SDK release notes, and while I don't have any insider knowledge that can confirm this is the exact same issue, it sure sounds the same.

image

AlCalzone commented 5 months ago

Understood. However, the same 500/700 series radios or chipsets are present in other full-built hubs and these issues don't seem to be present there

Not sure what to tell you, but I've seen my fair share of logs where Z-Wave JS tells a 500 series controller to send a command to a node and instead with "success/started" within milliseconds as usual, the controller does not respond for 10+ seconds or not at all. Maybe the full-built hubs are running a more stable firmware, but there's not a lot the host software can do wrong when the entire task consists of

  1. "here, send this"
  2. ...wait
tony-park commented 5 months ago

I used to run a veraplus, and saw lots of delays in zwave actions being performed, and even some that just never happened. I also have an ezloplus, but as the config was a pain in the rear, and it was so new to the market, and limited in scope, it was relegated to the standby box.

It sounds like SL are on the case.

For me, my setup has been working OK on the last couple of days. I've probably seen numerous jammed messages, but nothing is time critical, so it's annoying but thats life.

ember1205 commented 5 months ago

There seems to be zero public-facing information directly from SiL about what may be occurring within the SDK/firmware.

There an acknowledgement of an issue in this comment, which is one degree away.

There's also a bug mentioned in the latest public SDK release notes, and while I don't have any insider knowledge that can confirm this is the exact same issue, it sure sounds the same.

image

The claim is that it's the 700/800 controllers, but my understanding is that this is happening with the 500 series as well. So, not sure that it is the same. Dunno.