home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.46k stars 29.39k forks source link

Platform time and time_pattern triggered automations are executed twice #39012

Closed casakampa closed 3 years ago

casakampa commented 4 years ago

The problem

Any automation that is triggered with a hardcoded time trigger or time_pattern is executed twice. The automation triggers at the right time and one second before the right time. As a result, reports of duplicate automations appear in the log.

This happens when an automation is triggered with

- platform: time

or

- platform: time_trigger

Automations that are time triggered but use this:

  - platform: template
    value_template: >-
      {{ state_attr('input_datetime.waste_notification', 'timestamp') | timestamp_custom('%H:%M', false) == states('sensor.time') }}

do not have this problem.

Environment

Problem-relevant configuration.yaml

- id: update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc
  alias: 'Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC' # OTC is Outside Temperature Control
  trigger:
    platform: time_pattern
    minutes: "/15"
  mode: single
  action:
    service: opentherm_gw.set_outside_temperature
    data_template:
      gateway_id: otgw
      temperature: >-
        {{ state_attr( 'weather.thuis', 'temperature') }}

and

 - id: enable_pump_at_1_00
   alias: 'Schakel pomp vloerverwarming in om 1:00'
   trigger:
   - platform: time
     at: "01:00:00"
   condition: []
   action:
     - service: light.turn_on # Plug is presented as a light, not as a switch
       entity_id: light.stekker_vloerverwarming # Plug is used to enable the floor heating pump
     - delay:
         hours: 1
     - service: light.turn_off # Plug is presented as a light, not as a switch
       entity_id: light.stekker_vloerverwarming # Plug is used to disable the floor heating pump

Traceback/Error logs

2020-08-18 06:15:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 06:30:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 06:45:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 07:00:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 07:15:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 07:45:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 08:00:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 08:15:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-18 08:30:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running

Additional information

Two fragments from the logbook.

First one is an automation that uses time_pattern

8:45:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
8:44:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
8:30:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
8:29:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
8:15:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
8:14:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
8:00:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:59:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:45:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:44:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:30:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:15:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:14:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
7:00:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:59:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:45:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:44:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:30:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:29:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:15:00 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered
6:14:59 Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC has been triggered

Second one is from an automation that uses a hardcoded timestamp as a trigger

1:00:00 - Schakel pomp vloerverwarming in om 1:00 has been triggered
0:59:59 - Schakel pomp vloerverwarming in om 1:00 has been triggered
probot-home-assistant[bot] commented 4 years ago

automation documentation automation source (message by IssueLinks)

probot-home-assistant[bot] commented 4 years ago

Hey there @home-assistant/core, mind taking a look at this issue as its been labeled with an integration (automation) you are listed as a codeowner for? Thanks! (message by CodeOwnersMention)

pnbruckner commented 3 years ago

@mvandek I just tested this with 0.114.0 and I do not see the same behavior.

Please add the following to your configuration:

logger:
  default: info
  logs:
    homeassistant.core: debug

Then restart HA. Now, when it happens, look in home-assistant.log. There will be more details as to what's happening. Look closely at the trigger that happens the second before the automation was supposed to trigger. E.g., is there something calling the automation.trigger service causing these automations to run when they shouldn't?

casakampa commented 3 years ago

I've captured the debug output for 12 hours and filtered it to show the sensor.time, and the two mentioned automations. You can download the log here, because it's 1372 lines.

Since opening this issue i've updated to the newest Home Assistant version (0.114.3) and Supervisor (235).

balloob commented 3 years ago

What was the automation configuration that you used for your log? Is this the time pattern with /15 ?

I see indeed that your automation triggers twice.

2020-08-25 20:14:59 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event automation_triggered[L]: name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC, entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc>

2020-08-25 20:15:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event automation_triggered[L]: name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC, entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc>

The first trigger is happening at 20:14:59, that's actually a second too early! And so I think that what happens is that it finishes within a second, and then schedules itself, getting correctly scheduled for the 20:15:00 slot.

I see that you're running under a VM, so I wonder if that could impact timing issues.

CC @bdraco who has recently made some changes to time scheduling.

bdraco commented 3 years ago

Can you verify there is only one homeassistant instance being started in the VM/container?

We had some problems a while back with some containers starting two instances and I'd like to rule that out first.

balloob commented 3 years ago

I don't think it's 2 HAs because the automation state changes are intertwining.

Here are some formatted ones with less relevant info removed.

2020-08-25 19:59:59 DEBUG (MainThread) [homeassistant.core] Bus:Handling
<Event state_changed[L]:
  entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc,

  old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on;
    last_triggered=2020-08-25T19:45:00.001077+02:00, mode=single, current=0 @ 2020-08-25T19:54:37.928527+02:00>,

  new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T19:59:59.999420+02:00, mode=single, current=0 @ 2020-08-25T19:54:37.928527+02:00>>

2020-08-25 19:59:59 DEBUG (MainThread) [homeassistant.core] Bus:Handling
<Event state_changed[L]: 
  entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, 

  old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T19:59:59.999420+02:00, mode=single, current=0 @ 2020-08-25T19:54:37.928527+02:00>, 

  new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T19:59:59.999420+02:00, mode=single, current=1 @ 2020-08-25T19:54:37.928527+02:00>>

2020-08-25 20:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling
<Event state_changed[L]: 
  entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, 

  old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T19:59:59.999420+02:00, mode=single, current=1 @ 2020-08-25T19:54:37.928527+02:00>,

  new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T20:00:00.007029+02:00, mode=single, current=1 @ 2020-08-25T19:54:37.928527+02:00>>

2020-08-25 20:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling
<Event state_changed[L]: 
  entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, 

  old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T20:00:00.007029+02:00, mode=single, current=1 @ 2020-08-25T19:54:37.928527+02:00>,

  new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; 
    last_triggered=2020-08-25T20:00:00.007029+02:00, mode=single, current=0 @ 2020-08-25T19:54:37.928527+02:00>>

Each triggering causes 2 events. The 3rd event shows the 2nd trigger

bdraco commented 3 years ago

I think we need to reverse the order in

    cancel_callback = hass.loop.call_at(
        hass.loop.time() + next_time.timestamp() - time.time(),
        pattern_time_change_listener,
    )

and get time.time() first because time can tick forward a few microseconds between when we check both timestamps which results in scheduling a few microseconds early.

It may also be that this case fetching time on the hardware/VM is a lot more expensive for some reason. I've seen similar behavior on aws

balloob commented 3 years ago

Do you mean like this?

diff --git a/homeassistant/helpers/event.py b/homeassistant/helpers/event.py
index ba0b07a207..334ad3c676 100644
--- a/homeassistant/helpers/event.py
+++ b/homeassistant/helpers/event.py
@@ -996,14 +996,15 @@ def async_track_utc_time_change(

         calculate_next(now + timedelta(seconds=1))

+        now = time.time()
         cancel_callback = hass.loop.call_at(
-            hass.loop.time() + next_time.timestamp() - time.time(),
+            hass.loop.time() + next_time.timestamp() - now,
             pattern_time_change_listener,
         )

+    now = time.time()
     cancel_callback = hass.loop.call_at(
-        hass.loop.time() + next_time.timestamp() - time.time(),
-        pattern_time_change_listener,
+        hass.loop.time() + next_time.timestamp() - now, pattern_time_change_listener,
     )

     @callback

@mvandek could you try that?

pnbruckner commented 3 years ago

So looking at those four state changes...

The first is a result of the first call to AutomationEntity.async_trigger() from async_track_time_change (@ 2020-08-25T19:59:59.999420+02:00) and comes from the automation integration when it updates its last_triggered attribute. The second is a result of the same call, and comes from the script helper updating the current attribute.

The third is a result of the second call to AutomationEntity.async_trigger() from async_track_time_change (@ 2020-08-25T20:00:00.007029+02:00) and, again, comes from the automation integration when it updates its last_triggered attribute. But this one results in the "Already running" message from the script helper and, hence, there is not another state change from the script helper like there is from the first call.

The fourth is a result of the first call completing.

So it does seem that async_track_time_change is calling action (i.e., AutomationEntity.async_trigger() in this case) twice, once 580 µsec too soon, and again 7,029 µsec too late.

My guess is because the first call is coming too soon, that calculate_next() is resulting in the same time and causing the second call.

As a side note, the automation integration is updating its last_triggered attribute directly instead of using the script helper's last_triggered attribute (like the script integration does.) This is causing two problems: 1) there are two state change events when an automation is triggered instead of just one, and 2) the last_triggered attribute is getting updated, even if the actions don't actually run (e.g., "Already running".) I knew this was an issue before but I didn't change it because I wasn't sure if people were potentially counting on the automation's last_triggered attribute updating when a trigger fires (and the conditions are met), even if the actions don't end up running (due to the mode limitation.) Maybe this should be changed after all.

casakampa commented 3 years ago

Immediately after version 0.114.4 became available, I installed this version. Then I rebooted the virtual machine, but the issue still exists.

I've read some of your suggestions to what I could do or check on my installation. Since I'm running HassOS 4.12, I'm not sure how to check if there are two instances of HA running at the same time. I'm running this VM on a fully updated VMware ESXi 7.0b installation with the latest firmware version.

So I added the debug code again to configuration.yaml and this is a snippet from the log, with corresponds with the automation that I provided when opening this issue, which is triggered every 15 minutes via a time_pattern. In the mean time this log is filling itself again with debug information, and tomorrow I can post a new link to the new debug log containing more information.

2020-08-26 21:59:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.time, old_state=<state sensor.time=21:58; friendly_name=Time, icon=mdi:clock @ 2020-08-26T21:58:00.596303+02:00>, new_state=<state sensor.time=21:59; friendly_name=Time, icon=mdi:clock @ 2020-08-26T21:59:00.596811+02:00>>

2020-08-26 21:59:59 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T21:45:00.004762+02:00, mode=single, current=0, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>, new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T21:59:59.999171+02:00, mode=single, current=0, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>>

2020-08-26 21:59:59 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event automation_triggered[L]: name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC, entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc>
2020-08-26 21:59:59 INFO (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Executing Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC
2020-08-26 21:59:59 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T21:59:59.999171+02:00, mode=single, current=0, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>, new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T21:59:59.999171+02:00, mode=single, current=1, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>>

2020-08-26 22:00:00 INFO (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Running script
2020-08-26 22:00:00 INFO (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Executing step call service
2020-08-26 22:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event call_service[L]: domain=opentherm_gw, service=set_outside_temperature, service_data=gateway_id=otgw, temperature=15.0>
2020-08-26 22:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T21:59:59.999171+02:00, mode=single, current=1, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>, new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T22:00:00.007802+02:00, mode=single, current=1, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>>

2020-08-26 22:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event automation_triggered[L]: name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC, entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc>
2020-08-26 22:00:00 INFO (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Executing Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC
2020-08-26 22:00:00 WARNING (MainThread) [homeassistant.components.automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc] Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC: Already running
2020-08-26 22:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc, old_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T22:00:00.007802+02:00, mode=single, current=1, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>, new_state=<state automation.update_otgw_elk_kwartier_met_buitentemperatuur_van_accuweather_of_met_no_voor_otc=on; last_triggered=2020-08-26T22:00:00.007802+02:00, mode=single, current=0, id=update_otgw_hourly_with_outside_temp_from_weather_integration_for_otc, friendly_name=Update OTGW elk kwartier met buitentemperatuur van AccuWeather of MET.no voor OTC @ 2020-08-26T21:52:39.288582+02:00>>

2020-08-26 22:00:00 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.time, old_state=<state sensor.time=21:59; friendly_name=Time, icon=mdi:clock @ 2020-08-26T21:59:00.596811+02:00>, new_state=<state sensor.time=22:00; friendly_name=Time, icon=mdi:clock @ 2020-08-26T22:00:00.597341+02:00>>

What is interesting is that this problem did not occur with version 0.113 or earlier. In the meantime, I've updated the supervisor and operating system and did not change the settings of the VM.

Another thing I can try is to restore a recently taken snapshot in another virtual machine (with HassOS) to a physically different computer and see if the problem occurs in that environment as well.

If you need more information, please let me know.

bdraco commented 3 years ago

This can still fire early since hass.loop.time() will always give us a time in the past. That really shouldn't be a problem since we account for this by doing calculate_next(now + timedelta(seconds=1)) when we fire pattern_time_change_listener.

find_next_time_expression_time is dropped the microseconds though... I wonder if that is enough to cause us not to move forward to the next time.

    result = now.replace(microsecond=0)
bdraco commented 3 years ago

Let's assume we have a time pattern listener that matches at 03:00:00 and fire the automation at

02:59:59.999999

We add 1 second to that so we end up with

03:00:00.999999

find_next_time_expression_time drops us to exactly

03:00:00

If find_next_time_expression_time returns 03:00:00 then there is our issue

pnbruckner commented 3 years ago

Also, does Python guarantee in an expression like this:

    cancel_callback = hass.loop.call_at(
        -time.time() + hass.loop.time() + next_time.timestamp(),
        pattern_time_change_listener,
    )

the order that the terms will be evaluated? It might be better to do what was suggested above:

    now = time.time()
    cancel_callback = hass.loop.call_at(
        hass.loop.time() + next_time.timestamp() - now,
        pattern_time_change_listener,
    )
bdraco commented 3 years ago

That is a good question. It does appear to evaluate left to right as long as the operators have the same precedence.

>>> -time.time() + time.time()
2.1457672119140625e-06
>>> time.time() + -time.time()
-7.152557373046875e-07
pnbruckner commented 3 years ago

Hmm, I wasn't aware, but I guess it does guarantee:

https://docs.python.org/3/reference/expressions.html#evaluation-order

pnbruckner commented 3 years ago

I wonder, would it be better to use:

    delay = next_time.timestamp() - time.time()
    cancel_callback = hass.loop.call_later(delay, pattern_time_change_listener)
bdraco commented 3 years ago

Back to https://github.com/home-assistant/core/issues/39012#issuecomment-681120841

Assumptions:

We still fire early since hass.loop.time() is always in the past since it take more than 1 microsecond to fetch it

We add time delta of 1 second in the future to make sure we don't hit the same timestamp. Since we fired a few microseconds early, that isn't enough to get us to the next time expression because find_next_time_expression_time is going to ignore the microseconds via now.replace(microsecond=0)

Since find_next_time_expression_time will give us back a time in the past because it drops microseconds

    now = datetime(2018, 10, 7, 10, 30, 0, 999999)
    assert find_next_time_expression_time(now, "*", "/30", 0) == datetime(2018, 10, 7, 10, 30, 0)

...we end up firing again almost right away

bdraco commented 3 years ago

@mvandek Can you try this change? https://github.com/bdraco/home-assistant/commit/2e01965fd313940692b290b5a9a742c7b67fc25f

diff --git a/homeassistant/helpers/event.py b/homeassistant/helpers/event.py
index cd43691999..dc0643a7f1 100644
--- a/homeassistant/helpers/event.py
+++ b/homeassistant/helpers/event.py
@@ -34,6 +34,8 @@ from homeassistant.loader import bind_hass
 from homeassistant.util import dt as dt_util
 from homeassistant.util.async_ import run_callback_threadsafe

+MAX_MICROSECONDS_LOST_FETCHING_LOOP_TIME = 50000
+
 TRACK_STATE_CHANGE_CALLBACKS = "track_state_change_callbacks"
 TRACK_STATE_CHANGE_LISTENER = "track_state_change_listener"

@@ -994,7 +996,23 @@ def async_track_utc_time_change(
         now = pattern_utc_now()
         hass.async_run_job(action, dt_util.as_local(now) if local else now)

-        calculate_next(now + timedelta(seconds=1))
+        # calculate_next calls find_next_time_expression_time
+        # which will drop the microseconds by replacing them
+        # to 0.
+        #
+        # Since fetching `hass.loop.time` takes at least more
+        # than 1 microsecond, we add 1 second and
+        # MAX_MICROSECONDS_LOST_FETCHING_LOOP_TIME microseconds
+        # to the next time calculation to ensure that
+        # when find_next_time_expression_time drops the microseconds
+        # the timedelta we add below is actually at least
+        # one full second later then when we just fired.
+        calculate_next(
+            now
+            + timedelta(
+                seconds=1, microseconds=MAX_MICROSECONDS_LOST_FETCHING_LOOP_TIME
+            )
+        )

         # We always get time.time() first to avoid time.time()
         # ticking forward after fetching hass.loop.time()
bdraco commented 3 years ago

Since we don't round time to the nearest second in logbook.....

Another solution that might be more human readable would be:

Assume MAX_MICROSECONDS_LOST_FETCHING_LOOP_TIME/MICROSECOND_TO_SECONDS == 0.05

    cancel_callback = hass.loop.call_at(
        -time.time() + hass.loop.time() + next_time.timestamp() + 0.05,
        pattern_time_change_listener,
    )

This means the automation always fires a few microseconds late (https://github.com/home-assistant/core/pull/39302)

03:00:00.000xxx. We see 03:00:00 in the log

instead of a few microseconds early (https://github.com/home-assistant/core/pull/39301)

02:59:59.999xxx. We see 02:59:59 in the log

I think https://github.com/home-assistant/core/pull/39302 makes more sense for human readability. Unfortunately many tests might fail with this approach.

bdraco commented 3 years ago

Well both of them passed the CI so it seems we don't have anything checking microseconds.

bdraco commented 3 years ago

@mvandek https://github.com/home-assistant/core/pull/39302 is a more human friendly fix. Could you try that one instead?

bdraco commented 3 years ago

I'm going to run https://github.com/home-assistant/core/pull/39302 overnight on my primary instance

bdraco commented 3 years ago
- id: '1598489065877'
  alias: Time Pattern Test
  description: Time Pattern Test
  trigger:
  - platform: time_pattern
    hours: '*'
    minutes: '*'
    seconds: '*'
  condition: []
  action:
  - service: system_log.write
    data_template:
      level: error
      logger: homeassistant.components.mylogger
      message: 'The time is: {{ now() }}'
  mode: single

Test automation to verify

bdraco commented 3 years ago

I think https://github.com/home-assistant/core/pull/39302 is probably the better solution as its probably a mistake to aim for perfect precision as its not possible here.

casakampa commented 3 years ago

@bdraco I have added your test automation to verify but I need your help to apply the diff you ask me to try.

I can access the server with SSH through the Terminal and SSH add-on installed through the Supervisor. Which commands do I need to enter into the cli to apply the diff?

It is also possible to create an administrator account for you so that you can research and test for yourself what effect the possible solutions have. You will then be able to access the SSH terminal via the web interface.

pnbruckner commented 3 years ago

FWIW, whatever you do to fix this, I think it's very important that the action should not be called any earlier than the requested time. A little late is ok, but early is bad (e.g., because of assumptions automation actions are likely to make about what the current time is, etc.)

Also, as I mentioned earlier, you might be better off using call_later instead of call_at. Not only is it simpler and closer to what is actually being done, it will add additional delay between the time time.time() is called and the loop's time is retrieved.

bdraco commented 3 years ago

FWIW, whatever you do to fix this, I think it's very important that the action should not be called any earlier than the requested time. A little late is ok, but early is bad (e.g., because of assumptions automation actions are likely to make about what the current time is, etc.)

Also, as I mentioned earlier, you might be better off using call_later instead of call_at. Not only is it simpler and closer to what is actually being done, it will add additional delay between the time time.time() is called and the loop's time is retrieved.

Thanks. Good idea. Switching to call_later seems safer since the time will continue to tick forward and we should always end up calling microseconds late instead of early... unless call_later has to fetch loop time under the hood, then we probably still need to add the safety time

pnbruckner commented 3 years ago

unless call_later has to fetch loop time under the hood, then we probably still need to add the safety time

Well, of course, it does. The point is, it will fetch it later, so it will add the pre-calculated delay (based on the difference between the requested time and time.time()) to a later loop time, so the when value it calculates will be later, and it will then call the action later.

Just like we discussed when trying to fix the mock version, the ultimate problem is the use of two different clocks - time.time() and whatever clock the loop uses. They may ultimately be based on the same physical clock, or not. So there's probably really no guarantees about how accurate each is, their rate of increase, their resolutions, etc. (other than their average rate of increase, i.e., on a macro level.)

So, in the end, there may be no other way to guarantee the action will not be called any earlier than the requested time other than by adding some fudge factor to the calculation of when (or delay if using call_later instead of call_at), especially if we want to allow fractional seconds.

And there's a thought. Do we want async_track_point_in_utc_time (and all the others based on it) to allow and honor fractional seconds? If not, I think it makes things simpler. E.g., I found this in the aiohttp library:

def call_later(cb, timeout, loop):  # type: ignore
    if timeout is not None and timeout > 0:
        when = ceil(loop.time() + timeout)
        return loop.call_at(when, cb)

Makes you wonder if they were dealing with the same issue.

bdraco commented 3 years ago

in asyncio, call_later is also a wrapper around call_at

    def call_later(self, delay, callback, *args, context=None):
        """Arrange for a callback to be called at a given time.

        Return a Handle: an opaque object with a cancel() method that
        can be used to cancel the call.

        The delay can be an int or float, expressed in seconds.  It is
        always relative to the current time.

        Each callback will be called exactly once.  If two callbacks
        are scheduled for exactly the same time, it undefined which
        will be called first.

        Any positional arguments after the callback will be passed to
        the callback when it is called.
        """
        timer = self.call_at(self.time() + delay, callback, *args,
                             context=context)
        if timer._source_traceback:
            del timer._source_traceback[-1]
        return timer
bdraco commented 3 years ago

unless call_later has to fetch loop time under the hood, then we probably still need to add the safety time

Well, of course, it does. The point is, it will fetch it later, so it will add the pre-calculated delay (based on the difference between the requested time and time.time()) to a later loop time, so the when value it calculates will be later, and it will then call the action later.

Just like we discussed when trying to fix the mock version, the ultimate problem is the use of two different clocks - time.time() and whatever clock the loop uses. They may ultimately be based on the same physical clock, or not. So there's probably really no guarantees about how accurate each is, their rate of increase, their resolutions, etc. (other than their average rate of increase, i.e., on a macro level.)

👍

So, in the end, there may be no other way to guarantee the action will not be called any earlier than the requested time other than by adding some fudge factor to the calculation of when (or delay if using call_later instead of call_at), especially if we want to allow fractional seconds.

And there's a thought. Do we want async_track_point_in_utc_time (and all the others based on it) to allow and honor fractional seconds? If not, I think it makes things simpler. E.g., I found this in the aiohttp library:

I think it is probably ok to not honor factional seconds for the time pattern listeners. I'm sure someone will want to have something fire every 0.1 seconds, but its not something we can do reliably so I don't think we should pretend otherwise.

def call_later(cb, timeout, loop):  # type: ignore
    if timeout is not None and timeout > 0:
        when = ceil(loop.time() + timeout)
        return loop.call_at(when, cb)

Makes you wonder if they were dealing with the same issue.

Definitely seems like it. I think this is a more elegant solution to this problem.

bdraco commented 3 years ago

So, in the end, there may be no other way to guarantee the action will not be called any earlier than the requested time other than by adding some fudge factor to the calculation of when (or delay if using call_later instead of call_at), especially if we want to allow fractional seconds.

I gave the aiohttp option a shot, but since we end up calling everything too late in the future now the tests fail. I think the aiohttp case it trying to avoid timing out too early so its ok if its a little too late.

bdraco commented 3 years ago

So it looks like we are back to adding the small bit of time back in to account for the tracking error (aka fudge factor)

That seems to work pretty well even under heavy load (overloaded the cpu at ~2020-08-27 08:50:15)

2020-08-27 13:50:13 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:13.564044-05:00
2020-08-27 13:50:14 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:14.294342-05:00
2020-08-27 13:50:18 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:16.411321-05:00
2020-08-27 13:50:18 WARNING (MainThread) [homeassistant.components.automation.time_pattern_test] Time Pattern Test: Already running
2020-08-27 13:50:21 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:21.139462-05:00
2020-08-27 13:50:22 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:22.267702-05:00
2020-08-27 13:50:23 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:23.003554-05:00
2020-08-27 13:50:24 ERROR (MainThread) [homeassistant.components.mylogger] The time is: 2020-08-27 08:50:24.003455-05:00
bdraco commented 3 years ago

We could wrap that in floor() so we always start the automation actual at the top of the second.

The downside are at least:

balloob commented 3 years ago

Just an FYI, when we relied on time changed events for our pattern listeners, we had a precision of 1 second. Anything more precise is a bonus but not a requirement

bdraco commented 3 years ago

find_next_time_expression_time is dropping the fractional seconds anyways...

bdraco commented 3 years ago

In testing, we don't want to use floor() since the microseconds aren't going to be in sync between human readable clock and the monotonic clock.

bdraco commented 3 years ago

So far in testing #39302 seems to be the more reliable at firing at .00xxxx when there isn't heavy load.

bdraco commented 3 years ago

FWIW, whatever you do to fix this, I think it's very important that the action should not be called any earlier than the requested time. A little late is ok, but early is bad (e.g., because of assumptions automation actions are likely to make about what the current time is, etc.)

39301 is definitely the wrong solution based on this. I'm going to close that out.

bdraco commented 3 years ago

@bdraco I have added your test automation to verify but I need your help to apply the diff you ask me to try.

I can access the server with SSH through the Terminal and SSH add-on installed through the Supervisor. Which commands do I need to enter into the cli to apply the diff?

It is also possible to create an administrator account for you so that you can research and test for yourself what effect the possible solutions have. You will then be able to access the SSH terminal via the web interface.

@mvandek

I've put together an event.py with the changes in #39302 that I have been running on top of 0.114.4 here https://raw.githubusercontent.com/bdraco/home-assistant/0.114.4_with_39302/homeassistant/helpers/event.py

To get access to the code, its a bit more involved on Home Assistant Operating System. You'll have to make a usb drive call CONFIG and put your key on it. https://developers.home-assistant.io/docs/operating-system/debugging/#home-assistant-operating-system

Before you do this, please make a backup in case something goes wrong.

Once you have that up, replace /usr/src/homeassistant/homeassistant/helpers/event.py with https://raw.githubusercontent.com/bdraco/home-assistant/0.114.4_with_39302/homeassistant/helpers/event.py and restart.

balloob commented 3 years ago

You should be able to use the terminal & SSH add-on and disable protection mode. You then get access to the Docker runtime and should be able to access the running HA container and swap out the file.

pnbruckner commented 3 years ago

As a side note, the automation integration is updating its last_triggered attribute directly instead of using the script helper's last_triggered attribute (like the script integration does.) This is causing two problems: 1) there are two state change events when an automation is triggered instead of just one, and 2) the last_triggered attribute is getting updated, even if the actions don't actually run (e.g., "Already running".) I knew this was an issue before but I didn't change it because I wasn't sure if people were potentially counting on the automation's last_triggered attribute updating when a trigger fires (and the conditions are met), even if the actions don't end up running (due to the mode limitation.) Maybe this should be changed after all.

Just an FYI ... I went ahead and did this in #39323.

casakampa commented 3 years ago

@balloob I am unable to replace the file via SSH - I disabled protection mode but I cannot find the /usr/src/homeassistant/ directory, or even /usr/src/. Instead, I made a copy of the original file and modified the event.py file into an overlay / diff folder using an Ubuntu Live CD.

But I'm not sure if the modified file is used by Home Assistant when triggering an automation using time or time_pattern. How can I verify - what commands do I need to put in - that the file I modified using the Ubuntu Live CD is used by Home Assistant?