Closed chneau closed 3 years ago
Reboot did not help, until I removed the UniFi integration, which had filled the log with errors.
Now back at 2% CPU load.
Same problem HA 2021.9.7 RPI 4 Reboot solved problem
I saw the same issues as listed above with my core-2021.10.4 install on Linux, which needed a restart to clear.
In your config directory, you'll find a
home-assistant.log.1
file - it is the previous log when HA restarted.This log.1 is only created during graceful reboot/shutdown, but not after a hard reset/reboot
This needs to be revised - the rotation of the log file is always done during the start of HA
It is not just the log file, both the Corse and Supervised should have the same persistent logs available (for HA OS).
Hmm, would be useful to have it in ungraceful situations as well - actually, it would make most sense to have it especial during ungraceful shutdown events, but i guess that's a different issue/request ;-)
Actually, the .1 log is there for me, but stopped as HA ceased to be responsive.
Log entries, seems the initial DST worked well (there are 2 lots of warnings for 1am) but as it hits 2am for the second time, it all goes wrong. There is an error at the second 1am stating a problem with recorder.
Note this is UK system so DST occurs at 2am and goes back to 1am.
2021-10-31 01:02:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:07:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:12:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:17:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:22:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:27:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:32:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:37:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:42:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:47:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:52:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:57:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:02:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:05:46 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31 01:07:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:12:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:17:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:22:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:27:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:32:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:37:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:39:20 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.vaccinations is taking over 10 seconds
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.rest.data] Error fetching data: https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=overview&structure={%22date%22:%22date%22,%22newPeopleVaccinatedFirstDoseByPublishDate%22:%22newPeopleVaccinatedFirstDoseByPublishDate%22,%22newPeopleVaccinatedSecondDoseByPublishDate%22:%22newPeopleVaccinatedSecondDoseByPublishDate%22,%22cumPeopleVaccinatedFirstDoseByPublishDate%22:%22cumPeopleVaccinatedFirstDoseByPublishDate%22,%22cumPeopleVaccinatedSecondDoseByPublishDate%22:%22cumPeopleVaccinatedSecondDoseByPublishDate%22} failed with
2021-10-31 01:39:21 WARNING (MainThread) [homeassistant.components.rest.sensor] Empty reply found when expecting JSON data
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'cumPeopleVaccinatedFirstDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.firstdose'
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'cumPeopleVaccinatedSecondDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.seconddose'
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'newPeopleVaccinatedFirstDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.newfirstdose'
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'newPeopleVaccinatedSecondDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.newseconddose'
2021-10-31 01:42:26 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:47:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:52:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:54:33 WARNING (MainThread) [homeassistant.helpers.entity] Update of calendar.petemary_pjmeor_co_uk is taking over 10 seconds
2021-10-31 01:54:33 WARNING (MainThread) [homeassistant.components.media_player] Updating samsungtv media_player took longer than the scheduled update interval 0:00:10
2021-10-31 01:54:33 WARNING (MainThread) [homeassistant.helpers.entity] Update of media_player.samsung_ue40h5500 is taking over 10 seconds
Same issue here. Got a notification about high CPU temp on HA app, did a "ha core restart" when I got up this morning, and things are running again. Just noticed something amusing on a temp sensor (looking at the RPi running my HA instance). Notice that 2 AM is duplicated ;)
Same staff. Right on DST change no more data stored in recorder:
No data recorder from Z2M & ZWaveJS2MQTT add-ons, Sonoff integration, gismeteo, any other data source.
At a same time - simple automations: like turn on switch when door is opened - working fine.
journalctl:
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.526 INFO ZWAVE: Node 4: metadata updated: 49-13-Air temperature
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.530 INFO ZWAVE: Node 4: value updated: 49-13-Air temperature 18.4 => 18.4
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.617 INFO ZWAVE: Node 4: metadata updated: 49-13-Air temperature
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.620 INFO ZWAVE: Node 4: value updated: 49-13-Air temperature 18.4 => 18.4
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.108 INFO ZWAVE: Node 6: value updated: 50-0-value-65537 -21474792.46 => -21474792.46
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.173 INFO ZWAVE: Node 6: value updated: 50-0-value-66049 0 => 0
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.236 INFO ZWAVE: Node 6: value updated: 50-0-value-66561 0 => 0
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.308 INFO ZWAVE: Node 6: value updated: 50-0-value-66817 0 => 0
Oct 31 00:59:43 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:43.665 INFO ZWAVE: Node 4: metadata updated: 49-10-Air temperature
Oct 31 00:59:43 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:43.669 INFO ZWAVE: Node 4: value updated: 49-10-Air temperature 20 => 20
Oct 31 01:00:00 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:00:00 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
Oct 31 01:00:00 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:00:00 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
Oct 31 01:00:00 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:00:00 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
... ... ...
Oct 31 01:59:59 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:59:58 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
Oct 31 01:59:59 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:59:58 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
Oct 31 01:59:59 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:59:59 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
Oct 31 01:59:59 ha c4990dec4d0e[472]: [0;1;31m[0;1;39m[0;1;31m2021-10-31 03:59:59 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running[0m
Oct 31 02:00:04 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:04.986 INFO ZWAVE: Node 2: value updated: 50-0-value-65537 137.53 => 137.53
Oct 31 02:00:05 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:05.051 INFO ZWAVE: Node 2: value updated: 50-0-value-66049 0 => 0
Oct 31 02:00:05 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:05.141 INFO ZWAVE: Node 2: value updated: 50-0-value-66561 0 => 0
Oct 31 02:00:05 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:05.210 INFO ZWAVE: Node 2: value updated: 50-0-value-66817 0 => 0
Oct 31 02:00:08 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:08.311 DEBUG APP: [session-file-store] Deleting expired sessions
Warning about same automation appeared in log 567,952 times and them stopped.
For those posting about missing data in history graphs: The data are there, at least my InfluxDB with Grafana behind has all data. On my side, the data from ESPHome and mqtt sensors were missung in HA. It is just HA Lovelace that is not showing the data in their charts. If this hint is not helpful, please delete without further notice, no problem.
For those posting about missing data in history graphs: The data are there, at least my InfluxDB with Grafana behind has all data. On my side, the data from ESPHome and mqtt sensors were missung in HA. It is just HA Lovelace that is not showing the data in their charts. If this hint is not helpful, please delete without further notice, no problem.
No, the data is really missing, it is not recorded to the Home Assistant db. Your InfluxDB is a different, separate database.
For those posting about missing data in history graphs: The data are there, at least my InfluxDB with Grafana behind has all data. On my side, the data from ESPHome and mqtt sensors were missung in HA. It is just HA Lovelace that is not showing the data in their charts. If this hint is not helpful, please delete without further notice, no problem.
No, once the recorder reached a queue of 30,000 events to be written to the database, It gave up. Closed the connection and never bothered trying to reconnect later.
Regarding the long term statistics: Frontend does not combine data from the two hours with same local time, but all data will be recorded in the database.
To make some sense out of this, we need try to see if there's some pattern with regards to what made the recorder queue overflow.
@asjmcguire mentioned he had a lot of warnings about automations being restarted, and that all those automation were using the time pattern. Is this true for everyone?
Same here. Yes, I had a lot of warnings about an automation that was started multiple times. It is triggered at a specific time during night. In the end something caused the database connection was dropped. Nothing was recorded until I restarted HA.
Please contact me on discord, emontnemery#6618, to share logs etc. in case they're big and/or sensitive and cant be attached here.
For those of you who had automations which were restarting, please attach your automations.yaml
and mention which automations were restarting, or just paste the configuration of a restarting automation.
@emontnemery one of automation that misbehaved was set to start at 02:01 every day. Because of this bug it attempted almost 800 000 times between 02:00 and 03:00 to start it (generating a big load and also forcing the plugs to communicate each time with the merros plug that made merross ban my plugs.. thankefully it looks after my mail they unbanned them
This is the automation in question:
alias: Activation Prises HC Nuit
description: ''
trigger:
- platform: time
at: '02:01'
condition: []
action:
- type: turn_on
device_id: b1867e82ca62022c732d71f1f4de2694
entity_id: switch.prise_salle_de_bain_local
domain: switch
- type: turn_on
device_id: 5d91122272ed2e9e59d9f96ec6a5fdfd
entity_id: switch.prise_aspirateur_outlet
domain: switch
- type: turn_on
device_id: e6ca43072edd525ad2aef294551ae63e
entity_id: switch.prise_chargeurs_salon_outlet
domain: switch
- type: turn_on
device_id: b0d86a41d0772211f9e286221450623c
entity_id: switch.prise_murale_chambre_outlet
domain: switch
- type: turn_on
device_id: a40ffe8d51f0674308033b85ba95c252
entity_id: switch.chargeur_trotinette_outlet
domain: switch
- type: turn_on
device_id: 23fd33bc4594cd6b0cf1ed6c6aab416c
entity_id: switch.chargeurs_chambre_outlet
domain: switch
mode: single
This was what was in the logs:
2021-10-31 02:01:00 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:00 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:00 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:01 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:01 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:01 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
.... skipping almost 800 000 lines like this...
2021-10-31 02:59:58 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:59:58 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:59:59 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 03:00:03 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
For those of you who had automations which were restarting, please attach your
automations.yaml
and mention which automations were restarting, or just paste the configuration of a restarting automation.
Hi Erik,
I've already put mine in my original post, but just to be sure, you can found it below. As can be seen, this automation has multiple triggers. The trigger that triggered the automation, was the time_pattern.
alias: Hobbykamer ventilator automatisch
description: ''
trigger:
- platform: device
type: turned_on
device_id: e23552b835e274e0f918377ae2059a0c
entity_id: light.hobbykamer_lamp_plafond
domain: light
for:
hours: 0
minutes: 2
seconds: 0
milliseconds: 0
id: licht aan
- platform: device
type: turned_off
device_id: e23552b835e274e0f918377ae2059a0c
entity_id: light.hobbykamer_lamp_plafond
domain: light
id: licht uit
- platform: time_pattern
minutes: /15
id: Tijdspatroon
- type: no_motion
platform: device
device_id: e778003098458a439c326f70937a426c
entity_id: binary_sensor.hobbykamer_motion_sensor_hoek_beweging
domain: binary_sensor
for:
hours: 2
minutes: 0
seconds: 0
milliseconds: 0
id: geen beweging
- platform: state
entity_id: input_boolean.algemeen_input_boolean_slapen
id: slapen
from: 'off'
to: 'on'
- platform: state
entity_id: input_boolean.algemeen_input_boolean_thuis
id: weg van huis
from: 'on'
to: 'off'
- platform: time
at: '06:00'
id: handmatige stand uitzetten
condition:
- condition: state
entity_id: input_boolean.hobbykamer_input_boolean_gastenstand
state: 'off'
action:
- choose:
- conditions:
- condition: trigger
id: licht aan
- condition: state
entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
state: 'off'
sequence:
- service: script.hobbykamer_script_ventilator
- conditions:
- condition: trigger
id: licht uit
- condition: device
type: is_off
device_id: f314e82799e2cf6c11e7e084b337a469
entity_id: switch.hobbykamer_smart_plug_dj_booth
domain: switch
- condition: state
entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
state: 'off'
sequence:
- service: fan.turn_off
target:
entity_id: fan.hobbykamer_ventilator
- conditions:
- condition: trigger
id: Tijdspatroon
- condition: device
type: is_on
device_id: e23552b835e274e0f918377ae2059a0c
entity_id: light.hobbykamer_lamp_plafond
domain: light
for:
hours: 0
minutes: 8
seconds: 0
milliseconds: 0
- condition: state
entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
state: 'off'
sequence:
- service: script.hobbykamer_script_ventilator
- conditions:
- condition: trigger
id: geen beweging
sequence:
- service: fan.turn_off
target:
entity_id: fan.hobbykamer_ventilator
- conditions:
- condition: trigger
id: slapen
sequence:
- service: fan.turn_off
target:
entity_id: fan.hobbykamer_ventilator
- conditions:
- condition: trigger
id: weg van huis
sequence:
- service: fan.turn_off
target:
entity_id: fan.hobbykamer_ventilator
- conditions:
- condition: trigger
id: handmatige stand uitzetten
sequence:
- service: input_boolean.turn_off
target:
entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
default: []
mode: restart
@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?
@emontnemery i dont use any automations in home-assistant. Everything is handled by node-red. But after time change HA starts consuming resources (and finally were killed by OOM). Im using sqlite recorder backend.
Logs:
2021-10-31 02:04:18 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31 02:41:01 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream:
2021-10-31 02:45:04 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream:
2021-10-31 02:47:02 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream:
2021-10-31 02:59:45 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.ping_google_dns is taking over 10 seconds
2021-10-31 02:59:45 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.ping_push_home_assistant is taking over 10 seconds
See memory usage from monitoring. I saw planty of network related errors (unable to ping, unable to reach ffmpeg stream, unable to reach weather api), both to local and to remote endpoints. I would expect that those networking issues were caused by resource/queues starvation. And all of those were caused by recorder integration.
EDIT: All problems started after moving time back.
EDIT2: I created some node-red automation to check HA health, and it started reporting HA failure right after 2:00. I just fire custom event from NR and receive this event from NR (if it was received, it means that HA is working). So i would expect that after 2:00 everything in HA stoped working (not only recorded, but also event bus or network integrations).
In my proxmox logs I noticed the issue started after 02:59:59, so "at the jump back" to the new 02:00:00 (noticeable due to spike in memory, CPU, disk I/o)(nothing runs at that time normally)
@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?
The looping started when local time moved back from 02:59:59 -> 02:00:00. Edit: my comment seems to be inccorect.
@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?
same as @pimw1 it looks to have started when it went back 1h
@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?
The looping started when local time moved back from 02:59:59 -> 02:00:00.
One moment, i am not so sure about my own statement anymore. Let me double check this.
@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?
The looping started when local time moved back from 02:59:59 -> 02:00:00.
One moment, i am not so sure about my own statement anymore. Let me double check this.
I checked it, and i am not sure: does the gui of home assistant sort history based on UTC time, or on local time?
Sorry,i was wrong again. I have now full prove that the issue started when local time moved back from 02:59:59 -> 02:00:00. I overlooked something in the lengthy list of logbook events. Check this:
What you can see is:
-normal events in the initial 02:00-03:00 hour -events going crashy when local time moved back from 02:59:59 -> 02:00:00: what you don't see in the picture but i do, is that above this, the automation is triggered many many many times until it all crashes down.
In short: the issue started when local time moved back from 02:59:59 -> 02:00:00. This is my validated answer ;) Sorry for the confusion.
@pimw1 for me it starts clearly after time change see below
For me too 👍 i was confused, but i see the same behaviour as you do.
I start the database purge service at 2:00. It started at 2:00 and when time changed from 2:59:59 to 2:00:00 it was triggered again. From then on it was triggered in an endless loop until I restared HA.
- id: database_purge
alias: Datenbank bereinigen
trigger:
- platform: time
at: 04:12 //changed from 2:00 to 04:12
action:
- service: recorder.purge
Can reproduce with the following testcase
async def test_async_track_time_change_dst(hass):
"""Test tracking time change."""
wildcard_runs = []
timezone = dt_util.get_time_zone("Europe/Vienna")
dt_util.set_default_time_zone(timezone)
dst_1_30 = dt_util.as_utc(datetime(2021, 10, 31, 1, 30, tzinfo=timezone))
dst_2_15 = dt_util.as_utc(datetime(2021, 10, 31, 2, 15, fold=0, tzinfo=timezone))
dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=0, tzinfo=timezone))
dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=0, tzinfo=timezone))
non_dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=1, tzinfo=timezone))
non_dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=1, tzinfo=timezone))
non_dst_3_30 = dt_util.as_utc(datetime(2021, 10, 31, 3, 30, tzinfo=timezone))
now = dst_1_30
def mocked_utcnow():
return now
with patch(
"homeassistant.util.dt.utcnow", side_effect=mocked_utcnow
), patch(
"homeassistant.helpers.event.time_tracker_utcnow", side_effect=mocked_utcnow
):
unsub = async_track_time_change(
hass, callback(lambda x: wildcard_runs.append(x)),
hour=2, minute=30, second=0
)
assert len(wildcard_runs) == 0
now = dst_2_15
async_fire_time_changed(hass, dst_2_15)
await hass.async_block_till_done()
assert len(wildcard_runs) == 0
now = dst_2_30
async_fire_time_changed(hass, dst_2_30)
await hass.async_block_till_done()
assert len(wildcard_runs) == 1
now = dst_2_45
async_fire_time_changed(hass, dst_2_45)
await hass.async_block_till_done()
assert len(wildcard_runs) == 1
now = non_dst_2_30
async_fire_time_changed(hass, non_dst_2_30)
await hass.async_block_till_done()
assert len(wildcard_runs) == 2
now = non_dst_2_45
async_fire_time_changed(hass, non_dst_2_45)
await hass.async_block_till_done()
assert len(wildcard_runs) == 2
now = non_dst_3_30
async_fire_time_changed(hass, non_dst_3_30)
await hass.async_block_till_done()
assert len(wildcard_runs) == 2
unsub()
At least the time_pattern
bug was probably introduced by https://github.com/home-assistant/core/pull/49643. One bug is this line: https://github.com/home-assistant/core/pull/49643/files#diff-ebdd5e43e71f51f15950749ccba6701248317f7217663167de537455941e8d7dR319 the fold is inverted (is_dst is true before dst change, where fold is 0 before dst change)
There appears to be another issue in the final _datetime_ambiguous(now)
check too, but haven't found the exact reason yet (see testcase)
Regarding the long term statistics: Frontend does not combine data from the two hours with same local time, but all data will be recorded in the database.
@emontnemery this is incorrect. I have no statistics entries in the Home Assistant database from the period between the end of summertime until 6 hours later, when I restarted HA.
For what it's worth. I fixed this permanently in pilight a while back by using the localtime_l
and mktime_l
functions from this library: https://github.com/NuxiNL/cloudlibc
It uses this python tool to parse the tzdata files to a C header which is very friendly way to use in your programming: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/locale/timezone/gen_tzdata.py https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/locale/timezone/tzdata.h
Which can be parsed with the earlier mentioned localtime_l
and mktime_l
. Since it uses tzdata, it is the most reliable source for timezone determination. Although the library is not maintained anymore, it can still be usefull to make this functionality into a independent shared library for C / module for python. The conversion tool is already in python, so it shouldn't be difficult to port.
I observed the same, I completely stop and wait 1 hour before restart, then everything was fine
Can reproduce with the following testcase
async def test_async_track_time_change_dst(hass): """Test tracking time change.""" wildcard_runs = [] timezone = dt_util.get_time_zone("Europe/Vienna") dt_util.set_default_time_zone(timezone) dst_1_30 = dt_util.as_utc(datetime(2021, 10, 31, 1, 30, tzinfo=timezone)) dst_2_15 = dt_util.as_utc(datetime(2021, 10, 31, 2, 15, fold=0, tzinfo=timezone)) dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=0, tzinfo=timezone)) dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=0, tzinfo=timezone)) non_dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=1, tzinfo=timezone)) non_dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=1, tzinfo=timezone)) non_dst_3_30 = dt_util.as_utc(datetime(2021, 10, 31, 3, 30, tzinfo=timezone)) now = dst_1_30 def mocked_utcnow(): return now with patch( "homeassistant.util.dt.utcnow", side_effect=mocked_utcnow ), patch( "homeassistant.helpers.event.time_tracker_utcnow", side_effect=mocked_utcnow ): unsub = async_track_time_change( hass, callback(lambda x: wildcard_runs.append(x)), hour=2, minute=30, second=0 ) assert len(wildcard_runs) == 0 now = dst_2_15 async_fire_time_changed(hass, dst_2_15) await hass.async_block_till_done() assert len(wildcard_runs) == 0 now = dst_2_30 async_fire_time_changed(hass, dst_2_30) await hass.async_block_till_done() assert len(wildcard_runs) == 1 now = dst_2_45 async_fire_time_changed(hass, dst_2_45) await hass.async_block_till_done() assert len(wildcard_runs) == 1 now = non_dst_2_30 async_fire_time_changed(hass, non_dst_2_30) await hass.async_block_till_done() assert len(wildcard_runs) == 2 now = non_dst_2_45 async_fire_time_changed(hass, non_dst_2_45) await hass.async_block_till_done() assert len(wildcard_runs) == 2 now = non_dst_3_30 async_fire_time_changed(hass, non_dst_3_30) await hass.async_block_till_done() assert len(wildcard_runs) == 2 unsub()
At least the
time_pattern
bug was probably introduced by #49643. One bug is this line: #49643 (files) the fold is inverted (is_dst is true before dst change, where fold is 0 before dst change)There appears to be another issue in the final
_datetime_ambiguous(now)
check too, but haven't found the exact reason yet (see testcase)
Update: the reason the tests still passed after https://github.com/home-assistant/core/pull/49643 is a weird behavior of python. The datetime equality method doesn't compare fold
when testing for equality of two datetime objects (effectively means you should never test equality with localized dt objects) :face_with_head_bandage:
>>> from datetime import datetime, timezone
>>> from zoneinfo import ZoneInfo
>>> tz=ZoneInfo("Europe/Vienna")
>>> dt1=datetime(2021, 10, 31, 2, 30, 0, fold=0, tzinfo=tz)
>>> dt2=datetime(2021, 10, 31, 2, 30, 0, fold=1, tzinfo=tz)
>>> dt1 == dt2
True
>>> dt1.astimezone(timezone.utc) == dt2.astimezone(timezone.utc)
False
Awesome work, @OttoWinter 👍
Help with testing the change in #58894 makes Home Assistant handle the end of DST correctly would be highly appreciated.
For most of us - the effect of the time change, was that resource usage went up and automations that relied on time pattern were triggered 20+ times a second for the entire hour. Home Assistant otherwise continued to run and automations based on motions sensors etc still triggered. However there are people who definitely don't have any time pattern triggers in their automations who still had excessive resource use - and every single one of us time pattern or not, suffered from the recorder giving up.
So there are clearly other issues that need to be addressed too - one of which is why did the recorder queue 30,0000 events and why did it not just drop them and stay connected to the database, instead of just giving up and never trying to reconnect to the database later on.
Every single person affected by the time change bug, lost connection to whatever database the recorder was using, and it was not reconnected until core was restarted - no data was recorded after a few minutes in to the repeated hour.
What caused the higher resource use for people not using the time pattern? Was is something to do with the new statistics calculations? Whatever it was it clearly involved storing data in the recorder.
@OttoWinter @emontnemery
(One of my automations and the log I recovered is posted near the top of the thread - unfortunately there is basically nothing in the log that gives a clue)
EDIT: I understand why the recorder would want to store so many events for those of us with time pattern triggers, because obviously the recorder would have been storing automation runs and presumably traces? But people not using time patterns, or indeed any automations at all - still suffered the problem, so there is something other than just automations that was wanting to update data a lot. Now we know from people using InfluxDB as well as the recorder, that InfluxDB managed to stay connected - though, it looks like from https://github.com/home-assistant/core/issues/58783#issuecomment-955699097 that InfluxDB was being asked to store a LOT of data too. I assume InfluxDB doesn't get asked to store statistics - so maybe we can rule that out?
The change has been released in 2021.10.7
It would be highly advisable to upgrade to 2021.10.7 or 2021.11.0 (when it comes out) for anyone with a time zone that has an upcoming DST change in near future.
In the process Erik and Otto also identified a long standing problem with going the other direction which has been fixed in https://github.com/home-assistant/core/pull/58914
I also had this issue. I do have the history stored with my shelly devices. Is it possible to manually add these states to the database, so that the values continue being correct? ATM I have a big hole in the graphs:
Same problem, no automation, the curves are flat after 02:01 winter time.
There's unfortunately not much to do about the missing data since Home Assistant was effectively not working during 02:00~02:59.
@asjmcguire
What caused the higher resource use for people not using the time pattern? Was is something to do with the new statistics calculations?
The recorder itself uses a time pattern to schedule statistics compilation every 5 minutes, that's why everyone was hit.
Closing this since #58894 has been merged
There's unfortunately not much to do about the missing data since Home Assistant was effectively not working during 02:00~02:59.
I did find this addon for home assistant. Isn't it possible to just force data onto the db file?
Sure, you can manually insert data in the database.
I was asked to keep all info regarding #58799 in this thread which is now closed while only the hang-up is presumably fixed .
Is the data gap between 1AM and 2AM been fixed?
This is most likely a side effect of the recorder being overrun; the hourly statistics for 01:00~02:00 is compiled after 02:00 when things had already gone south.
Improving the supervisor to detect this kind of problem makes sense. Please open an issue here: https://github.com/home-assistant/supervisor/issues
@mdeweerd just to be clear, you write that "I have an unreachable HA system". Did it not start working again after a manual restart?
Just one note to this comment:
the Supervisor would restart the core after a while
Which will hide the problem. Most of users will not even notice that. Thus the root cause will have no chance to be fixed.
@michalk-k that's exactly the point of a watchdog though; minimize the impact to the end-user instead of getting stuck in a failed state.
I got it. But at the same time other needs should be full-filled, in order to back-trace the issue. Currently even HA logs are reset by each restart. Keep in mind that HA devs are not prone to develop anything without detailed issue description. I could imagine the DST issue would never get traction if all impacted instances have silently restarted itself.
The problem
In UK 2021/10/31, at 01:59:59, time got back to 01:00:00 (summer to winter, Daylight saving), since then (it's 01:08) home-assistant has a high CPU usage, using a core at 100%.
Edit: memory usage seems to increase quickly:
at 01:14:00
Edit2: Switching lights work fine but it does not appear on the state history of the light.
What version of Home Assistant Core has the issue?
core-2021.10.6
I could not find the exact image id on docker hub, but here is the label section of
docker inspect
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Container
Integration causing the issue
No response
Link to integration documentation on our website
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Interesting
The recorder queue reached the maximum size of 30000
at
2021-10-31T01:03:30.660640416Z
I restarted the container to see if it could fix the issue, it did not.