home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.57k stars 1.52k forks source link

Update Zwave-JS to 1.46 crashing HA heavely #2252

Closed canedje closed 3 years ago

canedje commented 3 years ago

The problem

Zwave-js update to 1.46 did heavely crash the system. Even a restore of a backup is not working Updating the VM running HA gives an error the zwave USB is not recognized anymore. I had to run a working backup at HA and restart the NUC for getting up and running

Environment

version core-2021.11.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.9.7
os_name Linux
os_version 5.10.75
arch x86_64
timezone Europe/Amsterdam
Home Assistant Community Store GitHub API | ok -- | -- Github API Calls Remaining | 4268 Installed Version | 1.16.0 Stage | running Available Repositories | 889 Installed Repositories | 29
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | failed to load: timeout
Home Assistant Supervisor host_os | Home Assistant OS 6.6 -- | -- update_channel | stable supervisor_version | supervisor-2021.10.8 docker_version | 20.10.8 disk_total | 48.5 GB disk_used | 7.6 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | File editor (5.3.3), NGINX Home Assistant SSL proxy (3.0.2), Node-RED (10.0.1), Samba share (9.5.1), Terminal & SSH (9.2.1), Z-Wave JS (0.1.45), Samba Backup (5.0.0), ESPHome (2021.10.3), Duck DNS (1.14.0)
Lovelace dashboards | 3 -- | -- resources | 21 views | 97 mode | storage

Problem-relevant configuration

Traceback/Error logs

image

   manufacturer ID:     0x86
                                    product type:        0x01
                                    product ID:          0x5a
                                    supported functions: 
                                    · GetSerialApiInitData (0x02)
                                    · FUNC_ID_SERIAL_API_APPL_NODE_INFORMATION (0x03)
                                    · ApplicationCommand (0x04)
                                    · GetControllerCapabilities (0x05)
                                    · SetSerialApiTimeouts (0x06)
                                    · GetSerialApiCapabilities (0x07)
                                    · SoftReset (0x08)
                                    · UNKNOWN_FUNC_UNKNOWN_0x09 (0x09)
                                    · SetRFReceiveMode (0x10)
                                    · UNKNOWN_FUNC_SET_SLEEP_MODE (0x11)
                                    · FUNC_ID_ZW_SEND_NODE_INFORMATION (0x12)
                                    · SendData (0x13)
                                    · SendDataMulticast (0x14)
                                    · GetControllerVersion (0x15)
                                    · SendDataAbort (0x16)
                                    · FUNC_ID_ZW_R_F_POWER_LEVEL_SET (0x17)
                                    · UNKNOWN_FUNC_SEND_DATA_META (0x18)
                                    · FUNC_ID_ZW_GET_RANDOM (0x1c)
                                    · GetControllerId (0x20)
                                    · UNKNOWN_FUNC_MEMORY_GET_BYTE (0x21)
                                    · UNKNOWN_FUNC_MEMORY_PUT_BYTE (0x22)
                                    · UNKNOWN_FUNC_MEMORY_GET_BUFFER (0x23)
                                    · UNKNOWN_FUNC_MEMORY_PUT_BUFFER (0x24)
                                    · UNKNOWN_FUNC_FlashAutoProgSet (0x27)
                                    · GetNVMId (0x29)
                                    · ExtNVMReadLongBuffer (0x2a)
                                    · ExtNVMWriteLongBuffer (0x2b)
                                    · ExtNVMReadLongByte (0x2c)
                                    · ExtExtWriteLongByte (0x2d)
                                    · GetNodeProtocolInfo (0x41)
                                    · HardReset (0x42)
                                    · FUNC_ID_ZW_REPLICATION_COMMAND_COMPLETE (0x44)
                                    · FUNC_ID_ZW_REPLICATION_SEND_DATA (0x45)
                                    · AssignReturnRoute (0x46)
                                    · DeleteReturnRoute (0x47)
                                    · RequestNodeNeighborUpdate (0x48)
                                    · ApplicationUpdateRequest (0x49)
                                    · AddNodeToNetwork (0x4a)
                                    · RemoveNodeFromNetwork (0x4b)
                                    · FUNC_ID_ZW_CREATE_NEW_PRIMARY (0x4c)
                                    · FUNC_ID_ZW_CONTROLLER_CHANGE (0x4d)
                                    · FUNC_ID_ZW_SET_LEARN_MODE (0x50)
                                    · AssignSUCReturnRoute (0x51)
                                    · FUNC_ID_ZW_REQUEST_NETWORK_UPDATE (0x53)
                                    · SetSUCNodeId (0x54)
                                    · DeleteSUCReturnRoute (0x55)
                                    · GetSUCNodeId (0x56)
                                    · UNKNOWN_FUNC_SEND_SUC_ID (0x57)
                                    · FUNC_ID_ZW_EXPLORE_REQUEST_INCLUSION (0x5e)
                                    · RequestNodeInfo (0x60)
                                    · RemoveFailedNode (0x61)
                                    · IsFailedNode (0x62)
                                    · ReplaceFailedNode (0x63)
                                    · UNKNOWN_FUNC_UNKNOWN_0x66 (0x66)
                                    · UNKNOWN_FUNC_UNKNOWN_0x67 (0x67)
                                    · GetRoutingInfo (0x80)
                                    · UNKNOWN_FUNC_LOCK_ROUTE_RESPONSE (0x90)
                                    · UNKNOWN_FUNC_GET_PRIORITY_ROUTE (0x92)
                                    · UNKNOWN_FUNC_SET_PRIORITY_ROUTE (0x93)
                                    · UNKNOWN_FUNC_UNKNOWN_0x98 (0x98)
                                    · UNKNOWN_FUNC_UNKNOWN_0xB4 (0xb4)
                                    · UNKNOWN_FUNC_WATCH_DOG_ENABLE (0xb6)
                                    · UNKNOWN_FUNC_WATCH_DOG_DISABLE (0xb7)
                                    · UNKNOWN_FUNC_WATCH_DOG_KICK (0xb8)
                                    · UNKNOWN_FUNC_UNKNOWN_0xB9 (0xb9)
                                    · UNKNOWN_FUNC_RF_POWERLEVEL_GET (0xba)
                                    · UNKNOWN_FUNC_GET_LIBRARY_TYPE (0xbd)
                                    · UNKNOWN_FUNC_SEND_TEST_FRAME (0xbe)
                                    · UNKNOWN_FUNC_GET_PROTOCOL_STATUS (0xbf)
                                    · UNKNOWN_FUNC_UNKNOWN_0xD2 (0xd2)
                                    · UNKNOWN_FUNC_UNKNOWN_0xD3 (0xd3)
                                    · UNKNOWN_FUNC_UNKNOWN_0xD4 (0xd4)
                                    · undefined (0xee)
                                    · UNKNOWN_FUNC_UNKNOWN_0xEF (0xef)
2021-11-05T20:07:50.014Z CNTRLR   Performing soft reset...
2021-11-05T20:07:50.033Z CNTRLR   Waiting for the controller to reconnect...
2021-11-05T20:07:51.535Z CNTRLR   Re-opening serial port...
2021-11-05T20:08:00.554Z DRIVER   Failed to open the serial port: Error: No such file or directory, cannot open 
                                  /dev/ttyACM0
Error in driver ZWaveError: Failed to open the serial port: Error: No such file or directory, cannot open /dev/ttyACM0 (ZW0100)
    at Driver.tryOpenSerialport (/usr/src/node_modules/zwave-js/src/lib/driver/Driver.ts:874:17)
    at Driver.ensureSerialAPI (/usr/src/node_modules/zwave-js/src/lib/driver/Driver.ts:1846:4)
    at Driver.softResetInternal (/usr/src/node_modules/zwave-js/src/lib/driver/Driver.ts:1810:9)
    at Driver.initializeControllerAndNodes (/usr/src/node_modules/zwave-js/src/lib/driver/Driver.ts:959:6)
    at Immediate.<anonymous> (/usr/src/node_modules/zwave-js/src/lib/driver/Driver.ts:817:5) {
  code: 100,
  context: undefined,
  transactionSource: undefined
}
Shutting down
2021-11-05T20:08:00.562Z CNTRLR   Waiting for the Serial API to start...
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.

Additional information

Mkay505 commented 3 years ago

Same here:

Message: Z-Wave JS: Setup erneut versuchen

Bildschirmfoto 2021-11-05 um 21 37 57

USB Device is unknown:

Bildschirmfoto 2021-11-05 um 21 39 59

Thank you in advance

canedje commented 3 years ago

image I had exactly the same error

The bad part is that even the VM was corrupted not recognizing the USB device!!

Mkay505 commented 3 years ago

I'm not able doing a rebuild like you did. A restore via ha backup didn't work, you said too.

Hope the version 1.47 can repair this issue...

canedje @.***> schrieb am Fr., 5. Nov. 2021, 21:44:

[image: image] https://user-images.githubusercontent.com/23150947/140576150-69508c06-dc42-458a-8563-158e4b1fd2e9.png I had exactly the same error

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/home-assistant/addons/issues/2252#issuecomment-962208822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR4KG2Z3PZRTEOZZT3T4LYTUKQ63NANCNFSM5HOUVEXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

canedje commented 3 years ago

I only did get it running to reboot my NUC (Running Unraid with dockers and VM) first going back to a running full restore of HA(not the partly zwave-js snapshot)

Mkay505 commented 3 years ago

Great hint. I'll try to restore a my last full backup and reboot my ECM Server with unRAID too. Thx

canedje @.***> schrieb am Fr., 5. Nov. 2021, 21:54:

I only did get it running to reboot my NUC (Running Unraid with dockers and VM) first going back to an running full restore (not the partly zwave-js snapshot)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/home-assistant/addons/issues/2252#issuecomment-962213838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR4KG24NC6U24NBPKRRKI7DUKQ77RANCNFSM5HOUVEXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

KermitOfDarkness commented 3 years ago

Same issue here. Back Up of .45 and reboot fixes the issue, but impossible to upgrade to .46.

Mkay505 commented 3 years ago

Now it's working again. But stupid of me. I reset the Zwave Stick earlier. now all my 64 devices and over 700 entities are gone ... I thought these are stored in the home assistant.

All my automations are dead. I might throw up. Damned.

Never touch a running system. Old saying, but wise.

kpine commented 3 years ago

0.1.46 upgraded the z-wave driver. The new version will perform a "soft-reset" of the z-wave controller upon startup. This isn't compatible with all systems, because it can result in a removal and re-insertion of the USB device. The soft-reset can be disabled, but the addon does not provide that option yet.

canedje commented 3 years ago

Now it's working again. But stupid of me. I reset the Zwave Stick earlier. now all my 64 devices and over 700 entities are gone ... I thought these are stored in the home assistant.

All my automations are dead. I might throw up. Damned.

Never touch a running system. Old saying, but wise.

That’s my worst nightmare

canedje commented 3 years ago

0.1.46 upgraded the z-wave driver. The new version will perform a "soft-reset" of the z-wave controller upon startup. This isn't compatible with all systems, because it can result in a removal and re-insertion of the USB device. The soft-reset can be disabled, but the addon does not provide that option yet.

Causing a lot of work and stress

stephan890 commented 3 years ago

I experience the same; after the soft-reset I can see the USB stick being disconnected from the virtual machine (Synology VM). If I reconnect the USB to the VM Zware-JS starts the init again and soft-resets the USB again resulting in a disconnect. Restored 0.1.45 and no issues.

KermitOfDarkness commented 3 years ago

I experience the same; after the soft-reset I can see the USB stick being disconnected from the virtual machine (Synology VM). If I reconnect the USB to the VM Zware-JS starts the init again and soft-resets the USB again resulting in a disconnect. Restored 0.1.45 and no issues.

Exactly the same issue :(

hcooper commented 3 years ago

Yeah, this just screwed me over too. Same as others report - also attempting to restore from backups (thankful for such a good integrated backup system). But from what @kpine says this shouldn't be too complex a hotfix to resolve?

digitalit commented 3 years ago

Following

LucasHokerberg commented 3 years ago

A few days ago I noticed the exact same behaviour. I'm running HassOS as a VM on Unraid and the USB device was constantly being disconnected. The weird part is that my Decon container running directly on Unraid (and NOT in HA) also stopped working with an error that the USB device (ConBee II) was not reachable. After several reboots of both Unraid, Docker and HassOS VM, I got it to work (both for Deconz and Z-Wave JS). I think it was the timing between mounting the USB device in Unraid and restart of the Z-Wave JS to MQTT add-on that did the trick.

Unraid version 6.9.2 Home Assistant OS version 6.6 Home Assistant Core version 2021.10.7 Z-Wave JS to MQTT version 0.28.0

MartinHjelmare commented 3 years ago

If you can switch to the community add-on it offers an option to disable the soft-reset.

canedje commented 3 years ago

What do you mean "switch to the community add-on"? I do use the standard zwave-js add-on in supervisor of HA? I do not see this option

cogneato commented 3 years ago

@canedje The community add-on repository has an alternative zwavejs add-on (zwavejs to mqtt) with a few extra features and the option to disable soft reset . The mqtt is optional.

witterholt commented 3 years ago

I have the same problem. Removing and adding the stick works for some time and after that it crashed again. I restored my backup of 1.45 for now... Hopefully this will be fixed in the 1.47 version where the soft-reset can be disabled.

canedje commented 3 years ago

@cogneato I tried that before, but did not get it up and running. Thanks for the answer I hope there will be a solution for the normal Zwave-JS addon

gperreault commented 3 years ago

This also affects me, so glad I had a snapshot to revert to.

The "Soft Reset" option found in ZWaveJS2MQTT definitely needs to be exposed in the official ZWaveJS add-on configuration.

tiba222 commented 3 years ago

Same issue here, I reverted back to 1.45 for now. I really hope this will be fixed as soon as possible.

LucasHokerberg commented 3 years ago

I have turned off Soft Reset and I think I can confirm it's working. I have restarted Z-Wave JS to MQTT, as well as detach and attach the USB stick in Unraid, and I don't see any unexpected "Hardware Removed" in the logs. For now, it's working like a charm!

Not sure how and if this could have affect Deconz as well. Perhaps it's just one hell of a coincident, or something happens with the USB controller in Unraid.

LasseTheDude commented 3 years ago

Same here! Had a lot of trouble. Had to rollback to .45 and restart NUC. The Z wave stick was glowing red after the update.

Mkay505 commented 3 years ago

When I´m using "zwave js" version 1.45 AND Z-wave JS to MQTT in Version 0.28.0 > same error. (Based on core-2021.10.6) < Core 2021.10.7 and above has an issue with "SolarEdege API"... No fun these days

zwave js 1.45 and Z-wave JS to MQTT Version 0.27.0 works (Based on core 2021.10.6)

BUT

zwave js 1.46 and Z-wave JS to MQTT Version 0.27.0 WORKS!!!

The problem isn´t zwave js 1.46 - it is the Version 0.28.0 from Z-wave JS to MQTT.

witterholt commented 3 years ago

When I´m using "zwave js" version 1.45 AND Z-wave JS to MQTT in Version 0.28.0 > same error. (Based on core-2021.10.6) < Core 2021.10.7 and above has an issue with "SolarEdege API"... No fun these days

zwave js 1.45 and Z-wave JS to MQTT Version 0.27.0 works (Based on core 2021.10.6)

BUT

zwave js 1.46 and Z-wave JS to MQTT Version 0.27.0 WORKS!!!

The problem isn´t zwave js 1.46 - it is the Version 0.28.0 from Z-wave JS to MQTT.

No, it's not because I don't use the "Z-Wave JS to MQTT" add-on at all and it was broken since I upgraded from "Z-Wave JS 1.45" to "Z-Wave JS 1.46" and it works again since I rolled back to version 1.45.

kaazvaag commented 3 years ago

I have the same issue. Rolling back to version 1.45 made it work again. Can someone implement an option to disable the "soft reset" of the Z-wave stick?

tuday2 commented 3 years ago

Same issue and same fix (rolling back to 1.45).

MartinHjelmare commented 3 years ago

Please refrain from posting "same issue" and similar replies without adding new information. Every new post sends a notification to all people subscribed to the issue. The appropriate way of showing that you're affected by the issue is by reacting with 👍 to the top post.

Thanks!

korylprince commented 3 years ago

If you're running your VM on libvirt (KVM) as is pretty common, you can use udev rules to automatically reattach the USB when it gets disconnected. I wrote a small guide here.

In theory, this is better that just disabling the soft-reset, as I assume there are good reasons for doing so or they wouldn't have added that "feature".

kekonn commented 3 years ago

I am also on unRaid running HASSOS in a vm and downgrading to 1.45 and rebooting the vm didn't solve the issue.

korylprince commented 3 years ago

@kekonn you might need to remove the USB passthrough and add it back.

kekonn commented 3 years ago

I have. I also physically unplugged it, checked the vm, then added it back in. I switched to an external zwavejs2mqtt this morning as a workaround.

djjoakim commented 3 years ago

I have. I also physically unplugged it, checked the vm, then added it back in. I switched to an external zwavejs2mqtt this morning as a workaround.

I am also on unRaid running HASSOS in a vm and downgrading to 1.45 and rebooting the vm didn't solve the issue.

You need to restart unraid for it to work, just restarting the VM won't do the trick.

aximusnl commented 3 years ago

Strangely, when I downgrade (I can only downgrade to 1.4.0), I get immediately upgraded again automatically to 1.4.6. Auto update is turned off. So I don’t really have a workaround (using Synology virtual machine)

gperreault commented 3 years ago

In theory, this is better that just disabling the soft-reset, as I assume there are good reasons for doing so or they wouldn't have added that "feature".

To be fair, ZWaveJS has implemented soft reset knowing that a) it breaks certain configurations, but more importantly b) it provides a mechanism to override/revert to the previous behaviour. The next version of ZWaveJS actually tries to auto detect problematic ZWave controllers to prevent the soft reset from occurring.

It’s the “official” HA ZWaveJS add-on that does not appear to have taken either point into account.

kpine commented 3 years ago

The next version of ZWaveJS actually tries to auto detect problematic ZWave controllers to prevent the soft reset from occurring.

This already exists in the version of node-zwave-js that that addon has upgraded to. In addition to detecting specific controllers that are known to be problematic, the driver is also supposed to remember if a soft-reset fails, and avoid it in future startups, but that isn't working as intended. If it was, at worst it would have been a one-time failure (with perhaps a host restart) and correctable w/o changing any settings. Unfortunately, there isn't the manual fallback to workaround it.

aximusnl commented 3 years ago

I've opened a PR that allows the user to disable the soft reset from within Home Assistant: https://github.com/home-assistant/addons/pull/2260

MartinHjelmare commented 3 years ago

In https://github.com/home-assistant/addons/pull/2261 we've added detection of VM in use and automatic disable of soft-reset. We want to allow users to get running as easy as possible without configuring options as far as possible.

An option for advanced users, that can configure their VM and want to use soft-reset, is ok to add but it must be compatible with the automatic handling. :+1:

Please test version 0.1.47 when it shows up and report the success/failure. Thanks!

LasseTheDude commented 3 years ago

In #2261 we've added detection of VM in use and automatic disable of soft-reset. We want to allow users to get running as easy as possible without configuring options as far as possible.

An option for advanced users, that can configure their VM and want to use soft-reset, is ok to add but it must be compatible with the automatic handling. 👍

Please test version 0.1.47 when it shows up and report the success/failure. Thanks!

Works perfect. But had to restart HA server after update to .47. Thank you allot 🤝

gperreault commented 3 years ago

0.1.47 works for me as well (Synology VMM). Much appreciate the fix.

MartinHjelmare commented 3 years ago

Closing here since 0.1.47 seems to resolve the issue. We'll continue to try improve how soft-reset is handled and welcome help with that.

kimfrellsen commented 3 years ago

0.1.47 tested with hass 2021.11.2 on ESXi 6.7. Issue resolved. Thank you very much for the fix!

witterholt commented 3 years ago

0.1.47 Tested with hass 2021.11.2 on ESXi 7.0U2a. Issue resolved. Thank you very much for the fix!

canedje commented 3 years ago

Yes 1.47 did the trick. Thanks!!!

aximusnl commented 3 years ago

Yup, same here. If anyone still needs an option to turn it on or off manually, let me know.

digitalit commented 3 years ago

Hi guys, Maybe i'm not suppose to yet but i cant see the 0.1.47 update?

MartinHjelmare commented 3 years ago

Try refreshing the add-on store.

digitalit commented 3 years ago

Thank you so much guys,

I can confirm that 0.1.47 is working fine.

UnRaid 6.9.2 Hassos 6.6 VM

kekonn commented 3 years ago

Same here, the update seems to have fixed the issue.