home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.16k stars 31.13k forks source link

Phillips Hue Dimmer V2 doesn't update HA state correctly when used in automations #131694

Open j4n-e4t opened 6 days ago

j4n-e4t commented 6 days ago

The problem

This problem was originally described in #130259. The supplied fix dis solve my initial problem, but many users, including me, still have an issue with the Hue Dimmer V2. It doesn't seem to be a problem with ZHA itself but I'm not exactly sure.

When using the light.toggle action in an automation with the Hue Dimmer V2 (over ZHA), the action runs aka. the device turns on but HA doesn't get that state change and still displays the device to be off. That way, the next toggle action fails.

According to @outrun0506, the issue was fixed and then re-appeared in 2024.11.1

What version of Home Assistant Core has the issue?

core-2024.11.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 6 days ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

TheJulianJES commented 6 days ago

I believe this might be a regression in this PR:

@mguaylam mentioned something like this to me over Discord. It seems like Home Assistant Core errors out silently somewhere. Is there anything in the debug logs?

Hoping to check on this very soon myself. (cc @fholzer in case you have any ideas)

EDIT: It's caused by the two events firing one after another in less than a millisecond. A restart automation gets restarted and apparently kills the first ongoing service call, so we end up with incorrect state. It's not completely "caused" by the PR above, but the changed implementation (two events) breaks together with HA when using restart automations. (more info in comments below)

mguaylam commented 6 days ago

Can confirm the issue. HA automation fails silently when we use the simulated events (the old ones). Failing with just a : unknown error.

Whats particular is it seem to be able to at least go all the way to send the command to the light because it will turn on, but HA will never know the new state of the light and still show it off.

Then I'm guessing since it's a toggle command, HA will keep sending a turn on command to the light that is already turned on and we get stuck there until we send a turn on from the UI which don't invoke the breaking code and HA will happily display the light on.

I did search for everything in the logs but it fails silently even at debug.

My temporary fix was to modify my helper automation to the new natives events and it fixed it.

TheJulianJES commented 6 days ago

To confirm, the same issue happens both when using device triggers and also when just listening to zha_event (and not using device triggers)?

j4n-e4t commented 6 days ago

Whats particular is it seem to be able to at least go all the way to send the command to the light because it will turn on, but HA will never know the new state of the light and still show it off.

I initially thought it was an issue with the state not being updated by the device being controlled. However, any other remote I tried, works in that matter (1 from Xiaomi, 2 from IKEA).

Then I'm guessing since it's a toggle command, HA will keep sending a turn on command to the light that is already turned on and we get stuck there until we send a turn on from the UI which don't invoke the breaking code and HA will happily display the light on.

Exactly, but when the light.toggle is called anywhere else like through the Streamdeck integration or another automation, the issue doesn't appear.

@TheJulianJES, I will look into the logs later today when I'm home.

TheJulianJES commented 6 days ago

Please also post a YAML snippet of the trigger(s) from your automations causing the issue.

mguaylam commented 6 days ago

If my understanding is correct, there was a refactoring made (https://github.com/zigpy/zha-device-handlers/pull/2340) but to keep the backward compatibility, old events are now being simulated when we produce the new events :

on_press changed for on_short_release up_press changed for up_short_release down_press changed for down_short_release

And what seem to break are those 3 simulated events on_press up_press and down_press

outrun0506 commented 6 days ago

Just to clarify things. Issue started with 2024.11.0 and was not fixed with #130259 so my logs form 2024.10.4 and 2024.11.1 should still be valid.

mguaylam commented 6 days ago

@outrun0506 I'm pretty sure it's https://github.com/zigpy/zha-device-handlers/pull/2340 since it landed in 2024.11.0. Can you post your YAML or let us know what Blueprint you are using?

I'm using EPMatt philips_929002398602.

outrun0506 commented 6 days ago

@mguaylam Sure. I'm using hue_dimmer_switchv3.yaml. I had a quick look at the blueprint, and indeed it uses the on_press event

TheJulianJES commented 6 days ago

Interesting. So it seems like the device triggers in the automations UI do not have this issue and it only occurs when listening (to some) zha_events?

mguaylam commented 6 days ago

Yes. I remember testing that. Device triggers works fine. It's the events that cause the issue.

TheJulianJES commented 6 days ago

Can you listen to zha_event in the developer tools and post the output of one affected and one unaffected event here? Maybe we're just including some weird event args.. that cause something to break spectacularly.

Edit: Hmm, this doesn't look very likely.

j4n-e4t commented 5 days ago

UPDATE: I tried capturing any kind of debug log bud as @mguaylam pointed out earlier, HA just errors silently without any messages. At least I was not able to find them.

Also, my current workaround, to use the light.turn_on and light.turn_off actions accordingly, now resulted in the same error without me changing anything. It's a bit strange, since I can't really identify the actual scope of the problem. :disappointed:

TheJulianJES commented 5 days ago

I'm only able to reproduce this with blueprints so far. Changing mode from restart to single "fixes" the issue. I believe this might be a bug in the Home Assistant automation engine where the service call is somehow killed when a second ZHA event comes through? I'll keep digging..

TheJulianJES commented 5 days ago

This is the shortest automation I could come up with that reproduces the issue. If the on_press trigger condition is in the conditions, it won't fail, as the automation is only started once then.

However, with the following automation, it's triggered twice. Once by the on_press event and once by the on_short_release event. The automation will be restarted when the second ZHA event comes in. By then, the light.toggle service call already has been triggered, but the automation engine seems to kill it while not finished. The second zha_event will fail the if condition, so the second automation run will instantly end.

alias: Test Dimmer Automation
description: ""
triggers:
  - trigger: event
    event_type: zha_event
    event_data:
      device_id: 4a39122dc839f1fee24aff9dd32a1a91
      cluster_id: 64512
conditions: []
actions:
  - if:
      - condition: template
        value_template: "{{ trigger.event.data.command == 'on_press' }}"
    then:
      - action: light.toggle
        target:
          entity_id: light.zha_ikea_light
        data: {}
mode: restart

So it seems that when the restart mode kills an automation run, it kills ongoing service calls, even when they have been started already. I'd expect them to finish successfully still, but further automation actions should not run.

This is also why you're only seeing it for the _press events. They are emitted first, then the _short_release events. There isn't another event after _short_release, so the automation won't restart then.

TheJulianJES commented 5 days ago

While I believe there's an underlying issue with the restart mode in HA, we should still handle this differently and somewhat revert to the behavior we had before.

I think we already had both on_press and on_short_release before (when doing a single short press). However, now we queue up the events, so they fire "together" (delay is less than a millisecond). IIRC, we had on_press fire instantly before (for everything, so even if there's a double press in the end)?

The automations/blueprints also aren't ideal though, since they restart on all zha_events fired by the remote, even if they don't care about a particular one..

j4n-e4t commented 5 days ago

I can confirm that I'm also using the Controller Blueprint by EPMatt and that the mode set in this blueprint is indeed restart.

EDIT: I've now "taken control" over the blueprint and manually set the mode to single. At first glance, this might have fixed it. This is however only a temporary solution, since it makes the underlying automation so complex that editing it is a nightmare. 😫

TheJulianJES commented 5 days ago

Ok, after bdraco enlightened me that restart automations cancel asyncio tasks (so our started light.toggle service call) and don't wait for it to finish (that was a wrong assumption on my part, but I guess that makes sense), it looks like it's an issue in the HA integration/ZHA lib where we're not expecting the cancellation. We need to write HA state either way (and likely clean up some more things in the lib when cancelled, especially for lights).

CHEESECROOK commented 2 days ago

I can confirm that I'm also using the Controller Blueprint by EPMatt and that the mode set in this blueprint is indeed restart.

EDIT: I've now "taken control" over the blueprint and manually set the mode to single. At first glance, this might have fixed it. This is however only a temporary solution, since it makes the underlying automation so complex that editing it is a nightmare. 😫

Ive done the same, changed 5 automations to single and for short press it does work, however my hold loop press seems to be relying on restart? Correct me if im wrong.

If there are any better workarounds i would be happy to test them out, so would my wife :)

mguaylam commented 1 day ago

If there are any better workarounds i would be happy to test them out, so would my wife :)

Instead of changing the mode you can replace the events for the new ones as describe here : https://github.com/home-assistant/core/issues/131694#issuecomment-2503538245