Open j4n-e4t opened 6 days ago
Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
zha documentation zha source (message by IssueLinks)
I believe this might be a regression in this PR:
@mguaylam mentioned something like this to me over Discord. It seems like Home Assistant Core errors out silently somewhere. Is there anything in the debug logs?
Hoping to check on this very soon myself. (cc @fholzer in case you have any ideas)
EDIT: It's caused by the two events firing one after another in less than a millisecond. A restart
automation gets restarted and apparently kills the first ongoing service call, so we end up with incorrect state.
It's not completely "caused" by the PR above, but the changed implementation (two events) breaks together with HA when using restart
automations. (more info in comments below)
Can confirm the issue. HA automation fails silently when we use the simulated events (the old ones). Failing with just a : unknown error.
Whats particular is it seem to be able to at least go all the way to send the command to the light because it will turn on, but HA will never know the new state of the light and still show it off.
Then I'm guessing since it's a toggle command, HA will keep sending a turn on command to the light that is already turned on and we get stuck there until we send a turn on from the UI which don't invoke the breaking code and HA will happily display the light on.
I did search for everything in the logs but it fails silently even at debug.
My temporary fix was to modify my helper automation to the new natives events and it fixed it.
To confirm, the same issue happens both when using device triggers and also when just listening to zha_event
(and not using device triggers)?
Whats particular is it seem to be able to at least go all the way to send the command to the light because it will turn on, but HA will never know the new state of the light and still show it off.
I initially thought it was an issue with the state not being updated by the device being controlled. However, any other remote I tried, works in that matter (1 from Xiaomi, 2 from IKEA).
Then I'm guessing since it's a toggle command, HA will keep sending a turn on command to the light that is already turned on and we get stuck there until we send a turn on from the UI which don't invoke the breaking code and HA will happily display the light on.
Exactly, but when the light.toggle
is called anywhere else like through the Streamdeck integration or another automation, the issue doesn't appear.
@TheJulianJES, I will look into the logs later today when I'm home.
Please also post a YAML snippet of the trigger(s) from your automations causing the issue.
If my understanding is correct, there was a refactoring made (https://github.com/zigpy/zha-device-handlers/pull/2340) but to keep the backward compatibility, old events are now being simulated when we produce the new events :
on_press
changed for on_short_release
up_press
changed for up_short_release
down_press
changed for down_short_release
And what seem to break are those 3 simulated events on_press
up_press
and down_press
Just to clarify things. Issue started with 2024.11.0 and was not fixed with #130259 so my logs form 2024.10.4 and 2024.11.1 should still be valid.
@outrun0506 I'm pretty sure it's https://github.com/zigpy/zha-device-handlers/pull/2340 since it landed in 2024.11.0. Can you post your YAML or let us know what Blueprint you are using?
I'm using EPMatt philips_929002398602.
@mguaylam Sure. I'm using hue_dimmer_switchv3.yaml. I had a quick look at the blueprint, and indeed it uses the on_press event
Interesting. So it seems like the device triggers in the automations UI do not have this issue and it only occurs when listening (to some) zha_event
s?
Yes. I remember testing that. Device triggers works fine. It's the events that cause the issue.
Can you listen to zha_event
in the developer tools and post the output of one affected and one unaffected event here?
Maybe we're just including some weird event args.. that cause something to break spectacularly.
Edit: Hmm, this doesn't look very likely.
UPDATE: I tried capturing any kind of debug log bud as @mguaylam pointed out earlier, HA just errors silently without any messages. At least I was not able to find them.
Also, my current workaround, to use the light.turn_on
and light.turn_off
actions accordingly, now resulted in the same error without me changing anything. It's a bit strange, since I can't really identify the actual scope of the problem. :disappointed:
I'm only able to reproduce this with blueprints so far. Changing mode
from restart
to single
"fixes" the issue.
I believe this might be a bug in the Home Assistant automation engine where the service call is somehow killed when a second ZHA event comes through?
I'll keep digging..
This is the shortest automation I could come up with that reproduces the issue. If the on_press
trigger condition is in the conditions
, it won't fail, as the automation is only started once then.
However, with the following automation, it's triggered twice. Once by the on_press
event and once by the on_short_release
event.
The automation will be restarted when the second ZHA event comes in. By then, the light.toggle
service call already has been triggered, but the automation engine seems to kill it while not finished. The second zha_event
will fail the if
condition, so the second automation run will instantly end.
alias: Test Dimmer Automation
description: ""
triggers:
- trigger: event
event_type: zha_event
event_data:
device_id: 4a39122dc839f1fee24aff9dd32a1a91
cluster_id: 64512
conditions: []
actions:
- if:
- condition: template
value_template: "{{ trigger.event.data.command == 'on_press' }}"
then:
- action: light.toggle
target:
entity_id: light.zha_ikea_light
data: {}
mode: restart
So it seems that when the restart
mode kills an automation run, it kills ongoing service calls, even when they have been started already. I'd expect them to finish successfully still, but further automation actions should not run.
This is also why you're only seeing it for the _press
events. They are emitted first, then the _short_release
events. There isn't another event after _short_release
, so the automation won't restart then.
While I believe there's an underlying issue with the restart
mode in HA, we should still handle this differently and somewhat revert to the behavior we had before.
I think we already had both on_press
and on_short_release
before (when doing a single short press). However, now we queue up the events, so they fire "together" (delay is less than a millisecond).
IIRC, we had on_press
fire instantly before (for everything, so even if there's a double press in the end)?
The automations/blueprints also aren't ideal though, since they restart on all zha_event
s fired by the remote, even if they don't care about a particular one..
I can confirm that I'm also using the Controller Blueprint by EPMatt and that the mode
set in this blueprint is indeed restart
.
EDIT: I've now "taken control" over the blueprint and manually set the mode
to single
. At first glance, this might have fixed it. This is however only a temporary solution, since it makes the underlying automation so complex that editing it is a nightmare. 😫
Ok, after bdraco enlightened me that restart
automations cancel asyncio tasks (so our started light.toggle
service call) and don't wait for it to finish (that was a wrong assumption on my part, but I guess that makes sense), it looks like it's an issue in the HA integration/ZHA lib where we're not expecting the cancellation. We need to write HA state either way (and likely clean up some more things in the lib when cancelled, especially for lights).
I can confirm that I'm also using the Controller Blueprint by EPMatt and that the
mode
set in this blueprint is indeedrestart
.EDIT: I've now "taken control" over the blueprint and manually set the
mode
tosingle
. At first glance, this might have fixed it. This is however only a temporary solution, since it makes the underlying automation so complex that editing it is a nightmare. 😫
Ive done the same, changed 5 automations to single and for short press it does work, however my hold loop press seems to be relying on restart? Correct me if im wrong.
If there are any better workarounds i would be happy to test them out, so would my wife :)
If there are any better workarounds i would be happy to test them out, so would my wife :)
Instead of changing the mode you can replace the events for the new ones as describe here : https://github.com/home-assistant/core/issues/131694#issuecomment-2503538245
The problem
This problem was originally described in #130259. The supplied fix dis solve my initial problem, but many users, including me, still have an issue with the Hue Dimmer V2. It doesn't seem to be a problem with ZHA itself but I'm not exactly sure.
When using the
light.toggle
action in an automation with the Hue Dimmer V2 (over ZHA), the action runs aka. the device turns on but HA doesn't get that state change and still displays the device to be off. That way, the nexttoggle
action fails.According to @outrun0506, the issue was fixed and then re-appeared in
2024.11.1
What version of Home Assistant Core has the issue?
core-2024.11.1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
ZHA
Link to integration documentation on our website
https://www.home-assistant.io/integrations/zha/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response