home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.82k stars 963 forks source link

Upgrade from 7.1 to 7.2 Breaks ConBee 2/ZHA #1729

Closed VACIndustries closed 2 years ago

VACIndustries commented 2 years ago

Describe the issue you are experiencing

After upgrading to 7.2 from 7.1 it breaks ConBee2/ZHA integration, others having the issue in the forum: https://community.home-assistant.io/t/deconz-official-thread/100504/2960

This is on qcow2 VM (Unraid) but no errors are showing anywhere so I am not entirely sure whether this matters in the end or not.

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

7.2

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Working ConBee 2/ZHA
  2. Update VM to 7.2 via the built in update mechanism
  3. ZHA no longer works

Anything in the Supervisor logs that might be useful for us?

Nothing seen in basic supervisor logs.  Device is verified still present/mounted.

Anything in the Host logs that might be useful for us?

Nothing seen in basic supervisor logs.  Device is verified still present/mounted.

System Health information

System Health

version core-2021.12.9
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.9.7
os_name Linux
os_version 5.10.91
arch x86_64
timezone America/New_York
Home Assistant Community Store GitHub API | ok -- | -- Github API Calls Remaining | 4238 Installed Version | 1.19.1 Stage | running Available Repositories | 948 Downloaded Repositories | 21
AccuWeather can_reach_server | ok -- | -- remaining_requests | 44
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | February 21, 2022, 7:00 PM relayer_connected | true remote_enabled | true remote_connected | true alexa_enabled | false google_enabled | true remote_server | us-east-1-0.ui.nabu.casa can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | failed to load: timeout
Home Assistant Supervisor host_os | Home Assistant OS 7.2 -- | -- update_channel | stable supervisor_version | supervisor-2021.12.2 docker_version | 20.10.9 disk_total | 31.3 GB disk_used | 9.3 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Mosquitto broker (6.0.1), Samba share (9.5.1), Node-RED (10.3.4), Check Home Assistant configuration (3.9.0), File editor (5.3.3), Duck DNS (1.14.0), Let's Encrypt (4.12.0), Terminal & SSH (9.3.0), Home Assistant Google Drive Backup (0.105.2), MariaDB (2.4.0), NGINX Home Assistant SSL proxy (3.1.0), Z-Wave JS (0.1.52), DOODS (2), room-assistant (2.19.0)
Lovelace dashboards | 5 -- | -- resources | 12 views | 4 mode | storage

Additional information

No response

agners commented 2 years ago

Can you try to downgrade the OS to see if its really related to the OS?

Use the following command in the terminal:

ha os update --version 7.1
Taraman17 commented 2 years ago

I'm also having issues on a Raspi with HASS-OS 7,2 and a TI_CC2531 USB Stick. So it may be more general Zigbee problem.

The Zigbee network seems to be generally working, but some devices have strange behaviour. Yesterday, one switch would not accept set messages (timeout), but status messages were received. After poweroff and restart, it was working again. One movement sensor does not send off-events most of the time, although link-quality is ok.

So for now all I can definitely say is, that it started after update to 7.2

Is there an easy way to downgrade on Raspi with HASS-OS?

agners commented 2 years ago

@Taraman17 yes, you can downgrade HAOS using the command in the comment above.

agners commented 2 years ago

Actually, there was absolutely no change for Raspberry Pi users between 7.1 and 7.2. It is the very same kernel version etc. So this must have been caused by the restart itself, or any other component update.

Taraman17 commented 2 years ago

Surprisingly after downgrading everything looks normal again for me though...

agners commented 2 years ago

@Taraman17 it could be an intermittent issue. Ideally rebooting a couple of times in 7.1 and see if it continues to be stable, then upgrade to 7.2 again and do the same to compare.

Just checked again, there is really no change which should affect Raspberry Pi in the latest release (see https://github.com/home-assistant/operating-system/releases/tag/7.2).

pergolafabio commented 2 years ago

i did the test too Running HassOS 7.1 on esxi => upgraded to HassOS 7.2 , i have a conbee 1 stick with USB passtrough

no issues here,, all working fine after reboot / upgrade

Taraman17 commented 2 years ago

@agners since I will be away the next 2 days, I like to keep it running as is now that it is working for my family. I will have time at the weekend to test.

Taraman17 commented 2 years ago

After running for some hours, the Zigbee Network started to show dropouts again with 7.1 So after all it is just a coincidence, that the problems started after the update...

thiscantbeserious commented 2 years ago

After running for some hours, the Zigbee Network started to show dropouts again with 7.1 So after all it is just a coincidence, that the problems started after the update...

OT:

Oh sweet Zigbee at it again.

I bet my ass on it its the underlying Firmware from Texas Instruments causing some sort of data corruption over the long term on all those Coordinators leading to a broken routing table or something following the Coordinator / Router-Firmware development for some time now looking at the known "bugs" the base-firmware has (unfixable from anyone besides TI).

By that I mean even the newest generation of those originating from TI - the most stable one seems to be the Stick from Slaesh so far.

Really, I've been trough a few of these TI powered sticks, different kind of quality PCBs and assemblies with and without an Power Amplifier following Coordinator and Router developments and there are so many Firmware-Revisions with deeply embedded bugs originating from it ... it all seems like they don't give a crap and its supposed to be "experimental" - if these chips are also used within the Conbee and others then I wouldn't throw a second guess.

That goes only saying for Texas Instruments.

Funnily you dont get any of that with things like the official Tuya Gateway, rock stable, much much better despite everything ... to bad you can't really flash it with Tasmota or anything. It seems like they are using a Realtek Wifi Module for everything (makes sense) operating only in the upper bands but its so much more stable ... RTL8711AM

https://developer.tuya.com/en/docs/iot/wrg1-datasheet

Or maybe its the Zigbee standard that's broken and thats why they're all pushing so aggressively over to Thread also updating some legacy gear too ... meh. Good luck finding the needle in the haystack tought ...

For now I'd try buying a Stick from Slaesh and pray it stays stable - so far its been the most stable and solid one in a long journey for me.

agners commented 2 years ago

@thiscantbeserious ConBee 2 uses their own controller based on an Atmel microcontroller, see: https://www.phoscon.de/en/conbee2/techspec

Fleshi1981 commented 2 years ago

After running for some hours, the Zigbee Network started to show dropouts again with 7.1 So after all it is just a coincidence, that the problems started after the update...

Same problem here on a Raspberry Pi4 with ZHA dongle connected. I went back to Hass oss 7.1 and now everything is okay again.

LarsMichelsen commented 2 years ago

For now I'd try buying a Stick from Slaesh and pray it stays stable - so far its been the most stable and solid one in a long journey for me.

After update from 7.1 to 7.2 I also experienced issues with the Slaesh stick connected to a Intel NUC. The setup was stable for more than 3 weeks. Then, after the 7.2 reboot the problems started. Already tried to plug/unpplug, then reflash it and so on. No luck with 7.2. The zigbee2mqtt logs continuously produces logs like this with zigbee_herdsman_debug: true:

> node index.js
Zigbee2MQTT:info  2022-01-29 13:27:12: Logging to console and directory: '/share/zigbee2mqtt/log/2022-01-29.13-27-11' filename: log.txt
Zigbee2MQTT:info  2022-01-29 13:27:12: Starting Zigbee2MQTT version 1.22.2 (commit #unknown)
Zigbee2MQTT:info  2022-01-29 13:27:12: Starting zigbee-herdsman (0.13.188)
2022-01-29T12:27:12.726Z zigbee-herdsman:adapter Failed to validate path: 'Error: spawn udevadm ENOENT'
2022-01-29T12:27:12.727Z zigbee-herdsman:controller:log Starting with options '{"network":{"networkKeyDistribute":false,"networkKey":[4,3,5,7,9,1,13,15,0,2,4,6,8,1,9,3],"panID":6752,"extendedPanID":[221,221,221,221,221,221,221,221],"channelList":[11]},"serialPort":{"path":"/dev/serial/by-id/usb-Silicon_Labs_slae.sh_cc2652rb_stick_-_slaesh_s_iot_stuff_00_12_4B_00_23_93_36_45-if00-port0"},"databasePath":"/share/zigbee2mqtt/database.db","databaseBackupPath":"/share/zigbee2mqtt/database.db.backup","backupPath":"/share/zigbee2mqtt/coordinator_backup.json","adapter":{"disableLED":false,"concurrent":null,"delay":null}}'
2022-01-29T12:27:12.727Z zigbee-herdsman:adapter:zStack:znp:log Opening SerialPort with /dev/serial/by-id/usb-Silicon_Labs_slae.sh_cc2652rb_stick_-_slaesh_s_iot_stuff_00_12_4B_00_23_93_36_45-if00-port0 and {"baudRate":115200,"rtscts":false,"autoOpen":false}
2022-01-29T12:27:12.730Z zigbee-herdsman:adapter:zStack:znp:log Serialport opened
2022-01-29T12:27:12.731Z zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - ping - {"capabilities":1}
2022-01-29T12:27:12.732Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,33,1,32]
2022-01-29T12:27:12.985Z zigbee-herdsman:adapter:zStack:znp:log Writing CC2530/CC2531 skip bootloader payload
2022-01-29T12:27:12.986Z zigbee-herdsman:adapter:zStack:unpi:writer --> buffer [239]
2022-01-29T12:27:13.988Z zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - ping - {"capabilities":1}
2022-01-29T12:27:13.990Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,33,1,32]
2022-01-29T12:27:14.241Z zigbee-herdsman:adapter:zStack:znp:log Skip bootloader for CC2652/CC1352
2022-01-29T12:27:14.701Z zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - ping - {"capabilities":1}
2022-01-29T12:27:14.702Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,33,1,32]
2022-01-29T12:27:20.703Z zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - ping - {"capabilities":1}
2022-01-29T12:27:20.704Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,33,1,32]
2022-01-29T12:27:26.711Z zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - ping - {"capabilities":1}
2022-01-29T12:27:26.712Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,33,1,32]

Seems there is something fundamentally broken with the communication.

Luckily I found this thread. So at least I am not alone with my pain ;-).

Did a downgrade to 7.1, did not make it work again. Next uninstalled zigbee2mqtt, did not work again. Next stopped zigbee2mqtt, reflashed the stick, started zigbee2mqtt, did not work again. Next stopped zigbee2mqtt, removed /share/zigbee2mqtt, started zigbee2mqqt, did not work again.

Slowly I am running out of ideas :-(.

LarsMichelsen commented 2 years ago

I solved my problem.

I have a second USB serial device for ebusd connected to my NUC. Both devices are appearing as /dev/ttyUSB* and the ebusd container was configured to use /dev/ttyUSB0 instead of the /dev/serial/by-id/[name]. Since the order of the /dev/tty* devices is not deterministic it was accidentally accessing the Slaesh serial device which broke access for zigbee2mqtt. So it just looks like a coincidence that the problem occurred during this reboot.

Now that I've fixed that configuration, the zigbee2mqtt communication is working fine again. I applied all updates, and now it works again.

Bottom line: My issue was not related to the 7.2 update.

agners commented 2 years ago

@LarsMichelsen Thanks for the updated!

@VACIndustries @Fleshi1981 any update from your side? Did you test downgrading? Could it be that another Add-on is not using by-id and using the wrong device similar to what @LarsMichelsen is seeing?

In general, it is not recommended to reference /dev/ttyUSBX directly, since Linux does not guarantee the order of those devices. A simple reboot, unplugging/re-plugging a device or a new kernel version can change that name. This gets problematic as soon as two /dev/ttyUSBX devices are used on a device.

github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.