home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.32k stars 30.63k forks source link

ZHA - Styrbar and Symonfisk don't successfuly interview consistently #124634

Open Bo-The-Lab opened 2 months ago

Bo-The-Lab commented 2 months ago

The problem

Very quick rundown of events:

  1. Paired Strybar to ZHA, all showed as successful. However, noticed device would not bind to other devices. a. Observed FW version was 1.0.24 & listed up-to-date. Observed the quirk was listed as zigpy.quirks.v2.CustomDeviceV2 not a Styrbar specific quirk. b. Discovered the Strybar FW issue (a number mismatch when the version is written in hex on the distro server) c. Attempted some of the recommended solutions for ZHA (local firmware repo and fixed firmware file or ZigPy tweaks to force a pseudo-downgrade because of the number mismatch), none worked. d. Installed z2m and successfully updated the firmware on one Strybar. e. Attempted to pair Strybar with ZHA, UI remains at configuring however debug logging showed ZigPy crashing.
  2. Attempted to pair a different Styrbar that still has 1.0.24 firmware and observed the same hanging behaviour. a. Both Styrbars would not appear in the devices list until HA was restarted or ZHA reloaded on its own. b. After restart or reload they would appear.
  3. ZHA can't receive commands from New FW Styrbar, but New FW Styrbar will respond to pressing Identify in ZHA.
  4. ZHA is now displaying the following data for Old FW Styrbar, data was previously present when first Styrbar was paired and running Old FW: a. FW Version Unknown b. Battery Unknown c. Power source Battery or Unknown d. ZHA is receiving button presses fine, however the Styrbar is not lighting up when Identify is pressed in ZHA.

I don't think the version of HA Core is particularly relevant to this issue at the moment, but for reference I have updated HA while working on this issue over a 2-3 week period.

What version of Home Assistant Core has the issue?

core-2024.8.3

What was the last working version of Home Assistant Core?

core-2024.7.4

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

zha-01J49B82HM96PTSCE4FJRPAZA8-IKEA of Sweden Remote Control N2-a6eff746f694c8e4fc644ca78a846ab6.json home-assistant_zha_2024-08-26T10-34-33.369Z.log

Old FW Styrbar current state in ZHA: image

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 2 months ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

puddly commented 2 months ago

e. Attempted to pair Strybar with ZHA, UI remains at configuring however debug logging showed ZigPy crashing.

The tracebacks in your log aren't really crashes, they're just tracebacks for debug logging that explain why a request failed to send. The device just isn't responding and initialization fails.

Is your coordinator on a USB extension cable and away from 2.4GHz interference sources such as USB 3.0 ports, SSDs, WiFi routers, and so on?

Bo-The-Lab commented 2 months ago

e. Attempted to pair Strybar with ZHA, UI remains at configuring however debug logging showed ZigPy crashing.

The tracebacks in your log aren't really crashes, they're just tracebacks for debug logging that explain why a request failed to send. The device just isn't responding and initialization fails.

Is your coordinator on a USB extension cable and away from 2.4GHz interference sources such as USB 3.0 ports, SSDs, WiFi routers, and so on?

Thanks for the reply, I'm using a UZG-01 as my coordinator, connected over LAN. It's well away from any 2.4GHz source. I have a number of other Ikea devices that pair without issue. I've restarted HA between pairing attempts and then reloading ZHA as I've found that solves the missing Styrbar after pairing issue on its own.

I will try to pair the New FW Styrbar with logging on later today to see if the tracebacks are the same as the Old FW Styrbar did in the original debug log.

adambeck7 commented 2 months ago

Unfortunately I forgot to grab the error logs before I reverted, but updating to 2024.8.3 killed all my ZHA connections and I couldn't pair any new ones. Reverted to 2024.8.2 and was able to see all the old devices and pair new. Using a SkyConnect on a usb dongle away from the server. When I have some time I'll try to upgrade again and grab some logs.

Bo-The-Lab commented 2 months ago

Unfortunately I forgot to grab the error logs before I reverted, but updating to 2024.8.3 killed all my ZHA connections and I couldn't pair any new ones.

Thanks for helping out with information @adambeck7.

Here is some more troubleshooting I did tonight when I had time.

I've just tried pairing a SYMFONISK Sound Controller that I had to hand and I can confirm the same pairing behaviour as the STYRBAR. The steps I took were:

  1. Remove device from ZHA
  2. Reload ZHA (not restart HA)
  3. Pair the device in ZHA.

The pairing process didn't finish, it sat on 'Configuring'. I left it a long time, even after the new device scan had timed out. Then I reloaded ZHA and the SYMFONISK appeared, but didn't log any events. I do notice it is loading a dedicated quirk and this time the SYMFONISK isn't flashing its LED when the 'Identify' button is pressed in HA. Screenshot and logs.

image

home-assistant_zha_2024-08-27T12-37-10.257Z.log

Next I deleted and paired a TRETAKT smart plug. That was paired and marked as ready to use in seconds without trouble. I repeated the steps and restarted HA after deleting the TRETAKT just to be thorough. and no issues again.

With the TRETAKT now plugged in, I tried pairing the SYMFONISK with the smart plug at the other end of the desk to act as a router. The UZG coordinator is in the same room., but I'm being thorough. The SYMFONISK paired successfully, after the same steps as above. Here's the log: home-assistant_zha_2024-08-27T12-58-29.274Z.log (I haven't looked at it myself, its too late at night for that)

After that success, I tried pairing the New FW STYRBAR and it worked straight away. Again I forgot to turn on the debug log. But now I know that restarting HA can get a device to sucessfully pair, I can recreate the process and log it if required for the New FW STYRBAR.

Next I tried the Old FW STYRBAR and ZHA locked up like before. I forgot to turn logging on but the same behaviours of 'Unknown' entities and no event logging happened again after a reload. I'll restart HA and try once more.

After a HA restart, the Old FW STYRBAR paired no problems. Here is the resulting screen shot and logs: image

home-assistant_zha_2024-08-27T13-13-14.932Z.log

The screenshot above show the firmware is marked as 'up-to-date' in both devices. However they are reporting different firmware versions:

New FW Old FW
image image

I've also noticed the New FW STYRBAR isn't reporting its battery level. The entity is 'Unknown'

puddly commented 2 months ago

I've restarted HA between pairing attempts and then reloading ZHA as I've found that solves the missing Styrbar after pairing issue on its own.

This is likely because the device never actually finishes initializing so it realistically shouldn't be added to HA (restarting reloads it from the database, which pretends it joined): reporting is not fully set up for all attributes, as you can see by battery status being Unknown.

The screenshot above show the firmware is marked as 'up-to-date' in both devices. However they are reporting different firmware versions:

Unfortunately, this is just a limitation of Home Assistant's update entity. It can either be "on", "off", or "unavailable". This behavior will change in the next major release but this is due to us not enabling OTA updates for IKEA devices by default due to stability issues. You can see how to do that here: https://github.com/zigpy/zigpy/wiki/OTA-Configuration

In short, I think it sounds like older-firmware remotes are unreliable when joining the network. Reloading ZHA "tricks" it into treating the device as if it joined, but the device in fact is not usable at that point.

What happens if you join your remote not through the coordinator but through a router by clicking the three dot menu and selecting "Add devices via this device"?

Bo-The-Lab commented 1 month ago

Sorry I haven't been able to put any time into my HA setup lately, so everything was paused since my last comments. I'm still running HAOS 2024.8.3. I plan to backup, let the update run overnight and try some of this again tomorrow.

What happens if you join your remote not through the coordinator but through a router by clicking the three dot menu and selecting "Add devices via this device"?

I just tried this by pairing the two STYRBAR switches via a (new style) TRETAKT smart plug. Both paired and worked. Thank you @puddly for this advice, I at least have a second workflow for testing. 🙏

Both STYRBAR are using the zigpy.quirks.v2.CustomDeviceV2 quirk. Is this correct? Or is a dedicated quirk available. IIRC while I was looking into the STYRBAR issues, I thought I saw mention of a dedicated quirk for the version 1 of the STYRBAR hardware.

Would the quirk affect the ability to bind groups or devices to the STYRBAR? Is the quirk file used in that process? Forgive me, I haven't had time to look though a quirk lately, I know they are a lightweight file describing functionality. I've been unable to bind devices or groups to the STYRBAR remotes and this is a major part of my HA plans. I do note that goup binding may be a problem with the Ikea firmware. There is a discussion of this over on the Z2M GitHub. But I can't get devices to bind to any STYRBAR regardless.

...this is due to us not enabling OTA updates for IKEA devices by default due to stability issues.

I had arrived at posting this issue based on a massive mistake I made reading the ZHA and Zigpy discussions. I do apologise about that. However, I'm happy with my workflow of using a dedicated Z2M environment just for performing the firmware updates for now. ZHA handles taking control of my LAN based co-ordinator back seamlessly. With much more exposure to ZigBee etc. I'm of the opinion I could configure ZHA and Z2M to access the co-ordinator simultaneously. Purely for the experimentation.

Finally, forgive my rambling - sometimes overly detailed writing style.