home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.08k stars 29.74k forks source link

Issue after DST time change [update to 2021.10.7+ or 2021.11.0b4+ recommended] #58783

Closed chneau closed 2 years ago

chneau commented 2 years ago

The problem

In UK 2021/10/31, at 01:59:59, time got back to 01:00:00 (summer to winter, Daylight saving), since then (it's 01:08) home-assistant has a high CPU usage, using a core at 100%.

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK I/O         PIDS
42985e0497d4   hass      104.53%   251.7MiB / 7.658GiB   3.21%     0B / 0B   103MB / 1.77MB    15

Edit: memory usage seems to increase quickly:

at 01:14:00

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK I/O         PIDS
42985e0497d4   hass      104.93%   703.1MiB / 7.658GiB   8.97%     0B / 0B   112MB / 1.98MB    16

Edit2: Switching lights work fine but it does not appear on the state history of the light.

What version of Home Assistant Core has the issue?

core-2021.10.6

REPOSITORY                     TAG       IMAGE ID       CREATED         SIZE
homeassistant/home-assistant   stable    e0a45773808a   12 days ago     1.14GB

I could not find the exact image id on docker hub, but here is the label section of docker inspect

"io.hass.arch": "amd64",
"io.hass.base.arch": "amd64",
"io.hass.base.image": "homeassistant/amd64-base:3.14",
"io.hass.base.name": "python",
"io.hass.base.version": "2021.09.1",
"io.hass.type": "core",
"io.hass.version": "2021.10.6",
"org.opencontainers.image.authors": "The Home Assistant Authors",
"org.opencontainers.image.created": "2021-10-18 06:34:53+00:00",
"org.opencontainers.image.description": "Open-source home automation platform running on Python 3",
"org.opencontainers.image.documentation": "https://www.home-assistant.io/docs/",
"org.opencontainers.image.licenses": "Apache License 2.0",
"org.opencontainers.image.source": "https://github.com/home-assistant/core",
"org.opencontainers.image.title": "Home Assistant",
"org.opencontainers.image.url": "https://www.home-assistant.io/",
"org.opencontainers.image.version": "2021.10.6"

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

No response

Link to integration documentation on our website

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Interesting The recorder queue reached the maximum size of 30000

2021-10-30T09:40:10.884156228Z 2021-10-30 10:40:10 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [139778345277952] Disconnected: Did not receive auth message within 10 seconds
2021-10-30T09:40:22.323961416Z 2021-10-30 10:40:22 WARNING (MainThread) [homeassistant.components.webhook] Received message for unregistered webhook c9fa7b5955dcce6df0ec16e14a28b23623563b96373bc5a66c0413c418093008 from 192.168.1.117
2021-10-31T01:03:30.660640416Z 2021-10-31 01:03:30 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31T01:04:57.487128770Z [cont-finish.d] executing container finish scripts...
2021-10-31T01:04:57.489476430Z [cont-finish.d] done.

at 2021-10-31T01:03:30.660640416Z I restarted the container to see if it could fix the issue, it did not.



### Additional information

Maybe after 02:00:00 it will stop?
Everything is working properly: light switches, the mobile phone app is working properly, the website served by the container (server:8123) is working properly.
Restarting the container or restarting the PC does not solve the high CPU usage
PA1RB commented 2 years ago

image

Reboot did not help, until I removed the UniFi integration, which had filled the log with errors.

Now back at 2% CPU load.

image

perseus177 commented 2 years ago

Same problem HA 2021.9.7 RPI 4 Reboot solved problem

rit001 commented 2 years ago

I saw the same issues as listed above with my core-2021.10.4 install on Linux, which needed a restart to clear.

borpin commented 2 years ago

In your config directory, you'll find a home-assistant.log.1 file - it is the previous log when HA restarted.

This log.1 is only created during graceful reboot/shutdown, but not after a hard reset/reboot

This needs to be revised - the rotation of the log file is always done during the start of HA

It is not just the log file, both the Corse and Supervised should have the same persistent logs available (for HA OS).

borpin commented 2 years ago

Hmm, would be useful to have it in ungraceful situations as well - actually, it would make most sense to have it especial during ungraceful shutdown events, but i guess that's a different issue/request ;-)

Actually, the .1 log is there for me, but stopped as HA ceased to be responsive.

Log entries, seems the initial DST worked well (there are 2 lots of warnings for 1am) but as it hits 2am for the second time, it all goes wrong. There is an error at the second 1am stating a problem with recorder.

Note this is UK system so DST occurs at 2am and goes back to 1am.

2021-10-31 01:02:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:07:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:12:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:17:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:22:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:27:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:32:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:37:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:42:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:47:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:52:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:57:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:02:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:05:46 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31 01:07:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:12:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:17:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:22:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:27:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:32:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:37:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:39:20 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.vaccinations is taking over 10 seconds
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.rest.data] Error fetching data: https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=overview&structure={%22date%22:%22date%22,%22newPeopleVaccinatedFirstDoseByPublishDate%22:%22newPeopleVaccinatedFirstDoseByPublishDate%22,%22newPeopleVaccinatedSecondDoseByPublishDate%22:%22newPeopleVaccinatedSecondDoseByPublishDate%22,%22cumPeopleVaccinatedFirstDoseByPublishDate%22:%22cumPeopleVaccinatedFirstDoseByPublishDate%22,%22cumPeopleVaccinatedSecondDoseByPublishDate%22:%22cumPeopleVaccinatedSecondDoseByPublishDate%22} failed with 
2021-10-31 01:39:21 WARNING (MainThread) [homeassistant.components.rest.sensor] Empty reply found when expecting JSON data
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'cumPeopleVaccinatedFirstDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.firstdose'
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'cumPeopleVaccinatedSecondDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.seconddose'
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'newPeopleVaccinatedFirstDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.newfirstdose'
2021-10-31 01:39:21 ERROR (MainThread) [homeassistant.components.template.template_entity] TemplateError('TypeError: unsupported format string passed to NoneType.__format__') while processing template 'Template("{{ '{0:,.0f}'.format(state_attr('sensor.vaccinations', 'newPeopleVaccinatedSecondDoseByPublishDate')) }}")' for attribute '_attr_native_value' in entity 'sensor.newseconddose'
2021-10-31 01:42:26 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:47:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:52:25 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'power_on_behavior' when rendering '{{ value_json.power_on_behavior }}'
2021-10-31 01:54:33 WARNING (MainThread) [homeassistant.helpers.entity] Update of calendar.petemary_pjmeor_co_uk is taking over 10 seconds
2021-10-31 01:54:33 WARNING (MainThread) [homeassistant.components.media_player] Updating samsungtv media_player took longer than the scheduled update interval 0:00:10
2021-10-31 01:54:33 WARNING (MainThread) [homeassistant.helpers.entity] Update of media_player.samsung_ue40h5500 is taking over 10 seconds
ksga commented 2 years ago

Same issue here. Got a notification about high CPU temp on HA app, did a "ha core restart" when I got up this morning, and things are running again. Just noticed something amusing on a temp sensor (looking at the RPi running my HA instance). Notice that 2 AM is duplicated ;) image

Masterz69 commented 2 years ago

Same staff. Right on DST change no more data stored in recorder: image

No data recorder from Z2M & ZWaveJS2MQTT add-ons, Sonoff integration, gismeteo, any other data source.

At a same time - simple automations: like turn on switch when door is opened - working fine.

Masterz69 commented 2 years ago

journalctl:

Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.526 INFO ZWAVE: Node 4: metadata updated: 49-13-Air temperature
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.530 INFO ZWAVE: Node 4: value updated: 49-13-Air temperature 18.4 => 18.4
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.617 INFO ZWAVE: Node 4: metadata updated: 49-13-Air temperature
Oct 31 00:59:13 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:13.620 INFO ZWAVE: Node 4: value updated: 49-13-Air temperature 18.4 => 18.4
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.108 INFO ZWAVE: Node 6: value updated: 50-0-value-65537 -21474792.46 => -21474792.46
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.173 INFO ZWAVE: Node 6: value updated: 50-0-value-66049 0 => 0
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.236 INFO ZWAVE: Node 6: value updated: 50-0-value-66561 0 => 0
Oct 31 00:59:21 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:21.308 INFO ZWAVE: Node 6: value updated: 50-0-value-66817 0 => 0
Oct 31 00:59:43 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:43.665 INFO ZWAVE: Node 4: metadata updated: 49-10-Air temperature
Oct 31 00:59:43 ha 42ffe8f5ff34[472]: 2021-10-31 03:59:43.669 INFO ZWAVE: Node 4: value updated: 49-10-Air temperature 20 => 20
Oct 31 01:00:00 ha c4990dec4d0e[472]: 2021-10-31 03:00:00 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
Oct 31 01:00:00 ha c4990dec4d0e[472]: 2021-10-31 03:00:00 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
Oct 31 01:00:00 ha c4990dec4d0e[472]: 2021-10-31 03:00:00 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
... ... ...
Oct 31 01:59:59 ha c4990dec4d0e[472]: 2021-10-31 03:59:58 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
Oct 31 01:59:59 ha c4990dec4d0e[472]: 2021-10-31 03:59:58 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
Oct 31 01:59:59 ha c4990dec4d0e[472]: 2021-10-31 03:59:59 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
Oct 31 01:59:59 ha c4990dec4d0e[472]: 2021-10-31 03:59:59 WARNING (MainThread) [homeassistant.components.automation.septic_compressor_timer_start_stop_2] Septic Compressor Timer Bounce: Already running
Oct 31 02:00:04 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:04.986 INFO ZWAVE: Node 2: value updated: 50-0-value-65537 137.53 => 137.53
Oct 31 02:00:05 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:05.051 INFO ZWAVE: Node 2: value updated: 50-0-value-66049 0 => 0
Oct 31 02:00:05 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:05.141 INFO ZWAVE: Node 2: value updated: 50-0-value-66561 0 => 0
Oct 31 02:00:05 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:05.210 INFO ZWAVE: Node 2: value updated: 50-0-value-66817 0 => 0
Oct 31 02:00:08 ha 42ffe8f5ff34[472]: 2021-10-31 04:00:08.311 DEBUG APP: [session-file-store] Deleting expired sessions

Warning about same automation appeared in log 567,952 times and them stopped.

pstimpel commented 2 years ago

For those posting about missing data in history graphs: The data are there, at least my InfluxDB with Grafana behind has all data. On my side, the data from ESPHome and mqtt sensors were missung in HA. It is just HA Lovelace that is not showing the data in their charts. If this hint is not helpful, please delete without further notice, no problem.

bousch commented 2 years ago

For those posting about missing data in history graphs: The data are there, at least my InfluxDB with Grafana behind has all data. On my side, the data from ESPHome and mqtt sensors were missung in HA. It is just HA Lovelace that is not showing the data in their charts. If this hint is not helpful, please delete without further notice, no problem.

No, the data is really missing, it is not recorded to the Home Assistant db. Your InfluxDB is a different, separate database.

asjmcguire commented 2 years ago

For those posting about missing data in history graphs: The data are there, at least my InfluxDB with Grafana behind has all data. On my side, the data from ESPHome and mqtt sensors were missung in HA. It is just HA Lovelace that is not showing the data in their charts. If this hint is not helpful, please delete without further notice, no problem.

No, once the recorder reached a queue of 30,000 events to be written to the database, It gave up. Closed the connection and never bothered trying to reconnect later.

emontnemery commented 2 years ago

Regarding the long term statistics: Frontend does not combine data from the two hours with same local time, but all data will be recorded in the database.

To make some sense out of this, we need try to see if there's some pattern with regards to what made the recorder queue overflow.

@asjmcguire mentioned he had a lot of warnings about automations being restarted, and that all those automation were using the time pattern. Is this true for everyone?

DirkMu commented 2 years ago

Same here. Yes, I had a lot of warnings about an automation that was started multiple times. It is triggered at a specific time during night. In the end something caused the database connection was dropped. Nothing was recorded until I restarted HA.

emontnemery commented 2 years ago

Please contact me on discord, emontnemery#6618, to share logs etc. in case they're big and/or sensitive and cant be attached here.

emontnemery commented 2 years ago

For those of you who had automations which were restarting, please attach your automations.yaml and mention which automations were restarting, or just paste the configuration of a restarting automation.

ioannispelelis commented 2 years ago

@emontnemery one of automation that misbehaved was set to start at 02:01 every day. Because of this bug it attempted almost 800 000 times between 02:00 and 03:00 to start it (generating a big load and also forcing the plugs to communicate each time with the merros plug that made merross ban my plugs.. thankefully it looks after my mail they unbanned them

This is the automation in question:

alias: Activation Prises HC Nuit
description: ''
trigger:
  - platform: time
    at: '02:01'
condition: []
action:
  - type: turn_on
    device_id: b1867e82ca62022c732d71f1f4de2694
    entity_id: switch.prise_salle_de_bain_local
    domain: switch
  - type: turn_on
    device_id: 5d91122272ed2e9e59d9f96ec6a5fdfd
    entity_id: switch.prise_aspirateur_outlet
    domain: switch
  - type: turn_on
    device_id: e6ca43072edd525ad2aef294551ae63e
    entity_id: switch.prise_chargeurs_salon_outlet
    domain: switch
  - type: turn_on
    device_id: b0d86a41d0772211f9e286221450623c
    entity_id: switch.prise_murale_chambre_outlet
    domain: switch
  - type: turn_on
    device_id: a40ffe8d51f0674308033b85ba95c252
    entity_id: switch.chargeur_trotinette_outlet
    domain: switch
  - type: turn_on
    device_id: 23fd33bc4594cd6b0cf1ed6c6aab416c
    entity_id: switch.chargeurs_chambre_outlet
    domain: switch
mode: single

This was what was in the logs:

2021-10-31 02:01:00 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:00 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:00 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:01 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:01 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:01:01 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running

.... skipping almost 800 000 lines like this...

2021-10-31 02:59:58 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:59:58 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 02:59:59 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running
2021-10-31 03:00:03 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
pimw1 commented 2 years ago

For those of you who had automations which were restarting, please attach your automations.yaml and mention which automations were restarting, or just paste the configuration of a restarting automation.

Hi Erik,

I've already put mine in my original post, but just to be sure, you can found it below. As can be seen, this automation has multiple triggers. The trigger that triggered the automation, was the time_pattern.

alias: Hobbykamer ventilator automatisch
description: ''
trigger:
  - platform: device
    type: turned_on
    device_id: e23552b835e274e0f918377ae2059a0c
    entity_id: light.hobbykamer_lamp_plafond
    domain: light
    for:
      hours: 0
      minutes: 2
      seconds: 0
      milliseconds: 0
    id: licht aan
  - platform: device
    type: turned_off
    device_id: e23552b835e274e0f918377ae2059a0c
    entity_id: light.hobbykamer_lamp_plafond
    domain: light
    id: licht uit
  - platform: time_pattern
    minutes: /15
    id: Tijdspatroon
  - type: no_motion
    platform: device
    device_id: e778003098458a439c326f70937a426c
    entity_id: binary_sensor.hobbykamer_motion_sensor_hoek_beweging
    domain: binary_sensor
    for:
      hours: 2
      minutes: 0
      seconds: 0
      milliseconds: 0
    id: geen beweging
  - platform: state
    entity_id: input_boolean.algemeen_input_boolean_slapen
    id: slapen
    from: 'off'
    to: 'on'
  - platform: state
    entity_id: input_boolean.algemeen_input_boolean_thuis
    id: weg van huis
    from: 'on'
    to: 'off'
  - platform: time
    at: '06:00'
    id: handmatige stand uitzetten
condition:
  - condition: state
    entity_id: input_boolean.hobbykamer_input_boolean_gastenstand
    state: 'off'
action:
  - choose:
      - conditions:
          - condition: trigger
            id: licht aan
          - condition: state
            entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
            state: 'off'
        sequence:
          - service: script.hobbykamer_script_ventilator
      - conditions:
          - condition: trigger
            id: licht uit
          - condition: device
            type: is_off
            device_id: f314e82799e2cf6c11e7e084b337a469
            entity_id: switch.hobbykamer_smart_plug_dj_booth
            domain: switch
          - condition: state
            entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
            state: 'off'
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: Tijdspatroon
          - condition: device
            type: is_on
            device_id: e23552b835e274e0f918377ae2059a0c
            entity_id: light.hobbykamer_lamp_plafond
            domain: light
            for:
              hours: 0
              minutes: 8
              seconds: 0
              milliseconds: 0
          - condition: state
            entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
            state: 'off'
        sequence:
          - service: script.hobbykamer_script_ventilator
      - conditions:
          - condition: trigger
            id: geen beweging
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: slapen
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: weg van huis
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: handmatige stand uitzetten
        sequence:
          - service: input_boolean.turn_off
            target:
              entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
    default: []
mode: restart
emontnemery commented 2 years ago

@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?

bkupidura commented 2 years ago

@emontnemery i dont use any automations in home-assistant. Everything is handled by node-red. But after time change HA starts consuming resources (and finally were killed by OOM). Im using sqlite recorder backend.

Logs:

2021-10-31 02:04:18 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31 02:41:01 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream:
2021-10-31 02:45:04 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream:
2021-10-31 02:47:02 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream:
2021-10-31 02:59:45 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.ping_google_dns is taking over 10 seconds
2021-10-31 02:59:45 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.ping_push_home_assistant is taking over 10 seconds

See memory usage from monitoring. I saw planty of network related errors (unable to ping, unable to reach ffmpeg stream, unable to reach weather api), both to local and to remote endpoints. I would expect that those networking issues were caused by resource/queues starvation. And all of those were caused by recorder integration.

Screenshot 2021-11-01 at 09 43 31

EDIT: All problems started after moving time back.

EDIT2: I created some node-red automation to check HA health, and it started reporting HA failure right after 2:00. I just fire custom event from NR and receive this event from NR (if it was received, it means that HA is working). So i would expect that after 2:00 everything in HA stoped working (not only recorded, but also event bus or network integrations).

hmmbob commented 2 years ago

In my proxmox logs I noticed the issue started after 02:59:59, so "at the jump back" to the new 02:00:00 (noticeable due to spike in memory, CPU, disk I/o)(nothing runs at that time normally)

pimw1 commented 2 years ago

@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?

The looping started when local time moved back from 02:59:59 -> 02:00:00. Edit: my comment seems to be inccorect.

ioannispelelis commented 2 years ago

@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?

same as @pimw1 it looks to have started when it went back 1h

pimw1 commented 2 years ago

@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?

The looping started when local time moved back from 02:59:59 -> 02:00:00.

One moment, i am not so sure about my own statement anymore. Let me double check this.

pimw1 commented 2 years ago

@ioannispelelis, @pimw1: Did the looping start the first time local time was 02:00, or did it start when local time moved back from 02:59:59 -> 02:00:00?

The looping started when local time moved back from 02:59:59 -> 02:00:00.

One moment, i am not so sure about my own statement anymore. Let me double check this.

I checked it, and i am not sure: does the gui of home assistant sort history based on UTC time, or on local time?

pimw1 commented 2 years ago

Sorry,i was wrong again. I have now full prove that the issue started when local time moved back from 02:59:59 -> 02:00:00. I overlooked something in the lengthy list of logbook events. Check this:

image

What you can see is:

-normal events in the initial 02:00-03:00 hour -events going crashy when local time moved back from 02:59:59 -> 02:00:00: what you don't see in the picture but i do, is that above this, the automation is triggered many many many times until it all crashes down.

In short: the issue started when local time moved back from 02:59:59 -> 02:00:00. This is my validated answer ;) Sorry for the confusion.

ioannispelelis commented 2 years ago

@pimw1 for me it starts clearly after time change see below image

pimw1 commented 2 years ago

For me too 👍 i was confused, but i see the same behaviour as you do.

DirkMu commented 2 years ago

I start the database purge service at 2:00. It started at 2:00 and when time changed from 2:59:59 to 2:00:00 it was triggered again. From then on it was triggered in an endless loop until I restared HA.

- id: database_purge
  alias: Datenbank bereinigen
  trigger:
  - platform: time
    at: 04:12 //changed from 2:00 to 04:12
  action:
  - service: recorder.purge

image

OttoWinter commented 2 years ago

Can reproduce with the following testcase


async def test_async_track_time_change_dst(hass):
    """Test tracking time change."""
    wildcard_runs = []
    timezone = dt_util.get_time_zone("Europe/Vienna")
    dt_util.set_default_time_zone(timezone)

    dst_1_30 = dt_util.as_utc(datetime(2021, 10, 31, 1, 30, tzinfo=timezone))
    dst_2_15 = dt_util.as_utc(datetime(2021, 10, 31, 2, 15, fold=0, tzinfo=timezone))
    dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=0, tzinfo=timezone))
    dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=0, tzinfo=timezone))
    non_dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=1, tzinfo=timezone))
    non_dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=1, tzinfo=timezone))
    non_dst_3_30 = dt_util.as_utc(datetime(2021, 10, 31, 3, 30, tzinfo=timezone))

    now = dst_1_30

    def mocked_utcnow():
        return now

    with patch(
        "homeassistant.util.dt.utcnow", side_effect=mocked_utcnow
    ), patch(
        "homeassistant.helpers.event.time_tracker_utcnow", side_effect=mocked_utcnow
    ):
        unsub = async_track_time_change(
            hass, callback(lambda x: wildcard_runs.append(x)),
            hour=2, minute=30, second=0
        )

        assert len(wildcard_runs) == 0

        now = dst_2_15
        async_fire_time_changed(hass, dst_2_15)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 0

        now = dst_2_30
        async_fire_time_changed(hass, dst_2_30)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 1

        now = dst_2_45
        async_fire_time_changed(hass, dst_2_45)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 1

        now = non_dst_2_30
        async_fire_time_changed(hass, non_dst_2_30)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 2

        now = non_dst_2_45
        async_fire_time_changed(hass, non_dst_2_45)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 2

        now = non_dst_3_30
        async_fire_time_changed(hass, non_dst_3_30)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 2

    unsub()

At least the time_pattern bug was probably introduced by https://github.com/home-assistant/core/pull/49643. One bug is this line: https://github.com/home-assistant/core/pull/49643/files#diff-ebdd5e43e71f51f15950749ccba6701248317f7217663167de537455941e8d7dR319 the fold is inverted (is_dst is true before dst change, where fold is 0 before dst change)

There appears to be another issue in the final _datetime_ambiguous(now) check too, but haven't found the exact reason yet (see testcase)

bousch commented 2 years ago

Regarding the long term statistics: Frontend does not combine data from the two hours with same local time, but all data will be recorded in the database.

@emontnemery this is incorrect. I have no statistics entries in the Home Assistant database from the period between the end of summertime until 6 hours later, when I restarted HA.

CurlyMoo commented 2 years ago

For what it's worth. I fixed this permanently in pilight a while back by using the localtime_l and mktime_l functions from this library: https://github.com/NuxiNL/cloudlibc

It uses this python tool to parse the tzdata files to a C header which is very friendly way to use in your programming: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/locale/timezone/gen_tzdata.py https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/locale/timezone/tzdata.h

Which can be parsed with the earlier mentioned localtime_l and mktime_l. Since it uses tzdata, it is the most reliable source for timezone determination. Although the library is not maintained anymore, it can still be usefull to make this functionality into a independent shared library for C / module for python. The conversion tool is already in python, so it shouldn't be difficult to port.

zigomatichub commented 2 years ago

I observed the same, I completely stop and wait 1 hour before restart, then everything was fine

OttoWinter commented 2 years ago

Can reproduce with the following testcase

async def test_async_track_time_change_dst(hass):
    """Test tracking time change."""
    wildcard_runs = []
    timezone = dt_util.get_time_zone("Europe/Vienna")
    dt_util.set_default_time_zone(timezone)

    dst_1_30 = dt_util.as_utc(datetime(2021, 10, 31, 1, 30, tzinfo=timezone))
    dst_2_15 = dt_util.as_utc(datetime(2021, 10, 31, 2, 15, fold=0, tzinfo=timezone))
    dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=0, tzinfo=timezone))
    dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=0, tzinfo=timezone))
    non_dst_2_30 = dt_util.as_utc(datetime(2021, 10, 31, 2, 30, fold=1, tzinfo=timezone))
    non_dst_2_45 = dt_util.as_utc(datetime(2021, 10, 31, 2, 45, fold=1, tzinfo=timezone))
    non_dst_3_30 = dt_util.as_utc(datetime(2021, 10, 31, 3, 30, tzinfo=timezone))

    now = dst_1_30

    def mocked_utcnow():
        return now

    with patch(
        "homeassistant.util.dt.utcnow", side_effect=mocked_utcnow
    ), patch(
        "homeassistant.helpers.event.time_tracker_utcnow", side_effect=mocked_utcnow
    ):
        unsub = async_track_time_change(
            hass, callback(lambda x: wildcard_runs.append(x)),
            hour=2, minute=30, second=0
        )

        assert len(wildcard_runs) == 0

        now = dst_2_15
        async_fire_time_changed(hass, dst_2_15)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 0

        now = dst_2_30
        async_fire_time_changed(hass, dst_2_30)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 1

        now = dst_2_45
        async_fire_time_changed(hass, dst_2_45)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 1

        now = non_dst_2_30
        async_fire_time_changed(hass, non_dst_2_30)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 2

        now = non_dst_2_45
        async_fire_time_changed(hass, non_dst_2_45)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 2

        now = non_dst_3_30
        async_fire_time_changed(hass, non_dst_3_30)
        await hass.async_block_till_done()
        assert len(wildcard_runs) == 2

    unsub()

At least the time_pattern bug was probably introduced by #49643. One bug is this line: #49643 (files) the fold is inverted (is_dst is true before dst change, where fold is 0 before dst change)

There appears to be another issue in the final _datetime_ambiguous(now) check too, but haven't found the exact reason yet (see testcase)

Update: the reason the tests still passed after https://github.com/home-assistant/core/pull/49643 is a weird behavior of python. The datetime equality method doesn't compare fold when testing for equality of two datetime objects (effectively means you should never test equality with localized dt objects) :face_with_head_bandage:

>>> from datetime import datetime, timezone
>>> from zoneinfo import ZoneInfo
>>> tz=ZoneInfo("Europe/Vienna")
>>> dt1=datetime(2021, 10, 31, 2, 30, 0, fold=0, tzinfo=tz)
>>> dt2=datetime(2021, 10, 31, 2, 30, 0, fold=1, tzinfo=tz)
>>> dt1 == dt2
True
>>> dt1.astimezone(timezone.utc) == dt2.astimezone(timezone.utc)
False
emontnemery commented 2 years ago

Awesome work, @OttoWinter 👍

Help with testing the change in #58894 makes Home Assistant handle the end of DST correctly would be highly appreciated.

asjmcguire commented 2 years ago

For most of us - the effect of the time change, was that resource usage went up and automations that relied on time pattern were triggered 20+ times a second for the entire hour. Home Assistant otherwise continued to run and automations based on motions sensors etc still triggered. However there are people who definitely don't have any time pattern triggers in their automations who still had excessive resource use - and every single one of us time pattern or not, suffered from the recorder giving up.

So there are clearly other issues that need to be addressed too - one of which is why did the recorder queue 30,0000 events and why did it not just drop them and stay connected to the database, instead of just giving up and never trying to reconnect to the database later on.

Every single person affected by the time change bug, lost connection to whatever database the recorder was using, and it was not reconnected until core was restarted - no data was recorded after a few minutes in to the repeated hour.

What caused the higher resource use for people not using the time pattern? Was is something to do with the new statistics calculations? Whatever it was it clearly involved storing data in the recorder.

@OttoWinter @emontnemery

(One of my automations and the log I recovered is posted near the top of the thread - unfortunately there is basically nothing in the log that gives a clue)

EDIT: I understand why the recorder would want to store so many events for those of us with time pattern triggers, because obviously the recorder would have been storing automation runs and presumably traces? But people not using time patterns, or indeed any automations at all - still suffered the problem, so there is something other than just automations that was wanting to update data a lot. Now we know from people using InfluxDB as well as the recorder, that InfluxDB managed to stay connected - though, it looks like from https://github.com/home-assistant/core/issues/58783#issuecomment-955699097 that InfluxDB was being asked to store a LOT of data too. I assume InfluxDB doesn't get asked to store statistics - so maybe we can rule that out?

bdraco commented 2 years ago

The change has been released in 2021.10.7

bdraco commented 2 years ago

It would be highly advisable to upgrade to 2021.10.7 or 2021.11.0 (when it comes out) for anyone with a time zone that has an upcoming DST change in near future.

In the process Erik and Otto also identified a long standing problem with going the other direction which has been fixed in https://github.com/home-assistant/core/pull/58914

flixlix commented 2 years ago

I also had this issue. I do have the history stored with my shelly devices. Is it possible to manually add these states to the database, so that the values continue being correct? ATM I have a big hole in the graphs: image

antibarbie commented 2 years ago

Same problem, no automation, the curves are flat after 02:01 winter time. image

emontnemery commented 2 years ago

There's unfortunately not much to do about the missing data since Home Assistant was effectively not working during 02:00~02:59.

emontnemery commented 2 years ago

@asjmcguire

What caused the higher resource use for people not using the time pattern? Was is something to do with the new statistics calculations?

The recorder itself uses a time pattern to schedule statistics compilation every 5 minutes, that's why everyone was hit.

emontnemery commented 2 years ago

Closing this since #58894 has been merged

flixlix commented 2 years ago

There's unfortunately not much to do about the missing data since Home Assistant was effectively not working during 02:00~02:59.

I did find this addon for home assistant. Isn't it possible to just force data onto the db file?

emontnemery commented 2 years ago

Sure, you can manually insert data in the database.

mdeweerd commented 2 years ago

I was asked to keep all info regarding #58799 in this thread which is now closed while only the hang-up is presumably fixed .

emontnemery commented 2 years ago

Is the data gap between 1AM and 2AM been fixed?

This is most likely a side effect of the recorder being overrun; the hourly statistics for 01:00~02:00 is compiled after 02:00 when things had already gone south.

Improving the supervisor to detect this kind of problem makes sense. Please open an issue here: https://github.com/home-assistant/supervisor/issues

emontnemery commented 2 years ago

@mdeweerd just to be clear, you write that "I have an unreachable HA system". Did it not start working again after a manual restart?

michalk-k commented 2 years ago

Just one note to this comment:

the Supervisor would restart the core after a while

Which will hide the problem. Most of users will not even notice that. Thus the root cause will have no chance to be fixed.

emontnemery commented 2 years ago

@michalk-k that's exactly the point of a watchdog though; minimize the impact to the end-user instead of getting stuck in a failed state.

michalk-k commented 2 years ago

I got it. But at the same time other needs should be full-filled, in order to back-trace the issue. Currently even HA logs are reset by each restart. Keep in mind that HA devs are not prone to develop anything without detailed issue description. I could imagine the DST issue would never get traction if all impacted instances have silently restarted itself.