home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.1k stars 31.1k forks source link

Issue after DST time change [update to 2021.10.7+ or 2021.11.0b4+ recommended] #58783

Closed chneau closed 3 years ago

chneau commented 3 years ago

The problem

In UK 2021/10/31, at 01:59:59, time got back to 01:00:00 (summer to winter, Daylight saving), since then (it's 01:08) home-assistant has a high CPU usage, using a core at 100%.

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK I/O         PIDS
42985e0497d4   hass      104.53%   251.7MiB / 7.658GiB   3.21%     0B / 0B   103MB / 1.77MB    15

Edit: memory usage seems to increase quickly:

at 01:14:00

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK I/O         PIDS
42985e0497d4   hass      104.93%   703.1MiB / 7.658GiB   8.97%     0B / 0B   112MB / 1.98MB    16

Edit2: Switching lights work fine but it does not appear on the state history of the light.

What version of Home Assistant Core has the issue?

core-2021.10.6

REPOSITORY                     TAG       IMAGE ID       CREATED         SIZE
homeassistant/home-assistant   stable    e0a45773808a   12 days ago     1.14GB

I could not find the exact image id on docker hub, but here is the label section of docker inspect

"io.hass.arch": "amd64",
"io.hass.base.arch": "amd64",
"io.hass.base.image": "homeassistant/amd64-base:3.14",
"io.hass.base.name": "python",
"io.hass.base.version": "2021.09.1",
"io.hass.type": "core",
"io.hass.version": "2021.10.6",
"org.opencontainers.image.authors": "The Home Assistant Authors",
"org.opencontainers.image.created": "2021-10-18 06:34:53+00:00",
"org.opencontainers.image.description": "Open-source home automation platform running on Python 3",
"org.opencontainers.image.documentation": "https://www.home-assistant.io/docs/",
"org.opencontainers.image.licenses": "Apache License 2.0",
"org.opencontainers.image.source": "https://github.com/home-assistant/core",
"org.opencontainers.image.title": "Home Assistant",
"org.opencontainers.image.url": "https://www.home-assistant.io/",
"org.opencontainers.image.version": "2021.10.6"

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

No response

Link to integration documentation on our website

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Interesting The recorder queue reached the maximum size of 30000

2021-10-30T09:40:10.884156228Z 2021-10-30 10:40:10 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [139778345277952] Disconnected: Did not receive auth message within 10 seconds
2021-10-30T09:40:22.323961416Z 2021-10-30 10:40:22 WARNING (MainThread) [homeassistant.components.webhook] Received message for unregistered webhook c9fa7b5955dcce6df0ec16e14a28b23623563b96373bc5a66c0413c418093008 from 192.168.1.117
2021-10-31T01:03:30.660640416Z 2021-10-31 01:03:30 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31T01:04:57.487128770Z [cont-finish.d] executing container finish scripts...
2021-10-31T01:04:57.489476430Z [cont-finish.d] done.

at 2021-10-31T01:03:30.660640416Z I restarted the container to see if it could fix the issue, it did not.



### Additional information

Maybe after 02:00:00 it will stop?
Everything is working properly: light switches, the mobile phone app is working properly, the website served by the container (server:8123) is working properly.
Restarting the container or restarting the PC does not solve the high CPU usage
chneau commented 3 years ago

After 02:00 it was still using high CPU, but restarting the container fixed the CPU usage.

Light switch history is working again, but nothing has been recorded between the "second" 01:00 to 02:00

WarC0zes commented 3 years ago

I have an identical problem,

my config: RPi 3B Core: core-2021.11.0b2 Supervisor: supervisor-2021.10.8 OS: Home Assistant OS 6.6

error:

2021-10-31 02:07:32 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded

After reboot:

Logger: homeassistant.components.recorder Source: components/recorder/init.py:456 Integration: Recorder (documentation, issues) First occurred: 03:04:44 (1 occurrences) Last logged: 03:04:44

The recorder queue reached the maximum size of 30000; Events are no longer being recorded

Logger: homeassistant.components.hassio.handler Source: components/hassio/handler.py:237 Integration: Home Assistant Supervisor (documentation, issues) First occurred: 02:55:11 (5 occurrences) Last logged: 02:57:50

Timeout on /os/info request Timeout on /addons request Timeout on /store request Timeout on /core/stats request Timeout on /supervisor/stats request

Logger: homeassistant.components.hassio Source: components/hassio/websocket_api.py:109 Integration: Home Assistant Supervisor (documentation, issues) First occurred: 02:55:11 (5 occurrences) Last logged: 02:57:50

Failed to to call /os/info - Failed to to call /addons - Failed to to call /store - Failed to to call /core/stats - Failed to to call /supervisor/stats -

My processor run at 30% All the time, same for the memory.

Edit: CPU returned to normal (4% use)

asjmcguire commented 3 years ago

Same problem here - multiple automations are complaining that they are already running starting at 1am and ending at 1:59am. Number of occurrences logged 262724. As a result the recorder has died after reaching the maximum queue of 30000. There is no history available. The Home Assistant log won't load it complains there was an error loading the log and then after about 30 seconds, Chrome is basically unusable with a massive scrollbar so I guess the log does eventually load, and is HUGE.

I first became aware of this problem when my Brother in Aberdeen noticed his Home Assistant was running slow and the server where the MariaDB server lives was showing a LOT of activity and clearly the hard disk was working very hard. Supervisor said Core was using 96%. I shoved him on to the beta, and it's upgraded the database which seems to have calmed things down.

I then checked my install to find I have the same problem. I can't ask for help because I am "unsupported" on Ubuntu, but the install I did for my brother is the OVA.

I can grab the log file for my installation even though it is unsupported if it will help - it's 123mb!!

EDIT: The log isn't helpful:

2021-10-31 01:59:51 WARNING (MainThread) [homeassistant.helpers.template] Template variable warning: 'dict object' has no attribute 'ha_status' when rendering '{{ value_json['obs']['ha_status'] }}'
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.daylight] Daylight: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.sensor_on_thermostat] Sensor - Thermostat: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.camera_snapshots] Camera Snapshots: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.switch_on_house_boiler] Sensor - Boiler Running: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.daylight] Daylight: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.camera_snapshots] Camera Snapshots: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.switch_on_house_boiler] Sensor - Boiler Running: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.sensor_on_thermostat] Sensor - Thermostat: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.daylight] Daylight: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.sensor_on_thermostat] Sensor - Thermostat: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.camera_snapshots] Camera Snapshots: Already running
2021-10-31 01:00:00 WARNING (MainThread) [homeassistant.components.automation.switch_on_house_boiler] Sensor - Boiler Running: Already running

repeated multiple times every second

and then:

2021-10-31 01:12:38 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded

the complaints about automations already running are repeated constantly until:

2021-10-31 01:59:58 WARNING (MainThread) [homeassistant.components.automation.sensor_on_thermostat] Sensor - Thermostat: Already running
2021-10-31 01:59:59 WARNING (MainThread) [homeassistant.components.automation.switch_on_house_boiler] Sensor - Boiler Running: Already running
2021-10-31 01:59:59 WARNING (MainThread) [homeassistant.components.automation.sensor_on_thermostat] Sensor - Thermostat: Already running
2021-10-31 01:59:59 WARNING (MainThread) [homeassistant.components.automation.daylight] Daylight: Already running
2021-10-31 01:59:59 WARNING (MainThread) [homeassistant.components.automation.camera_snapshots] Camera Snapshots: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.camera_snapshots] Camera Snapshots: Already running

and then they stop. All these automations use the time pattern - no other automations have been affected.

Hope this helps.

Here is the simplest automation - camera snapshots, just so you can see it's not a configuration error:

alias: Camera Snapshots
description: ''
trigger:
  - platform: time_pattern
    minutes: /5
condition: []
action:
  - service: camera.snapshot
    target:
      entity_id: camera.livingroom_mainstream
    data:
      filename: /config/www/cameras/rtsp_livingroom_main.jpg
  - service: camera.snapshot
    target:
      entity_id: camera.bedroom_mainstream
    data:
      filename: /config/www/cameras/rtsp_bedroom_main.jpg
mode: single
jelgblad commented 3 years ago

Can confirm. I'm running HAOS on Hyper-V, and it blew up and crashed some time after DST -1 change.

Out of memory: Killed process 9351 (python3) [...]
systemd-coredump[110538]: Process 104 (systemd-journal) of user 0 dumped core.
Moxser commented 3 years ago

Same problem for me and i have also this error: "The recorder queue reached the maximum size of 30000 Events are no longer being recorded". I rebooted and everything is back to normal

hmmbob commented 3 years ago

Didn't see/notice the high CPU on my container based setup, but recorder died with the same error, also highly likely a DST issue (NLD):

Logger: homeassistant.components.recorder
Source: components/recorder/init.py:444
Integration: Recorder (documentation, issues)
First occurred: 2:02:44 AM (1 occurrences)
Last logged: 2:02:44 AM

The recorder queue reached the maximum size of 30000; Events are no longer being recorded

Tagging @emontnemery for visibility

mbo18 commented 3 years ago

Same here running HAOS 6.5 on a NUC. CPU was at 26% instead of 1, memory also increased. Same log about the recorder queue. A restart of HA solved the issue

N3rdix commented 3 years ago

Didn't see/notice the high CPU on my container based setup, but recorder died with the same error, also highly likely a DST issue (NLD):

Logger: homeassistant.components.recorder
Source: components/recorder/init.py:444
Integration: Recorder (documentation, issues)
First occurred: 2:02:44 AM (1 occurrences)
Last logged: 2:02:44 AM

The recorder queue reached the maximum size of 30000; Events are no longer being recorded

Exactly the same experience in my case, a restart solved the problem for now

gody01 commented 3 years ago

On HomeAssitan blue and core core-2021.10.6 after DST rollover core didn't not repsond any more. Restart core on cli help him respond agiain.

sim-san commented 3 years ago

Same for me. I can confirm this behavior on my 2 installions.

Calimerorulez commented 3 years ago

+1 on a proxmox host running everything latest versions.

niekniek89 commented 3 years ago

same problem here. home assistant only available after a reboot. all logging stopped at exactly 3 o'clock (time then goes back to 2 o'clock)

pimw1 commented 3 years ago

Same here, running 2021.11.0b2, on a raspberry pi 4, supervised. I've attached the logs from the docker container (homeassistant_logs.txt )

homeassistant_logs.txt

Restarting the docker container solved the issue.

In the Logbook, i can see that the following automation was triggered many times per second, for every second, exactly at the moment the winter time started. This continues until home assistant crashes (which is after 5-6 minutes):

alias: Hobbykamer ventilator automatisch
description: ''
trigger:
  - platform: device
    type: turned_on
    device_id: e23552b835e274e0f918377ae2059a0c
    entity_id: light.hobbykamer_lamp_plafond
    domain: light
    for:
      hours: 0
      minutes: 2
      seconds: 0
      milliseconds: 0
    id: licht aan
  - platform: device
    type: turned_off
    device_id: e23552b835e274e0f918377ae2059a0c
    entity_id: light.hobbykamer_lamp_plafond
    domain: light
    id: licht uit
  - platform: time_pattern
    minutes: /15
    id: Tijdspatroon
  - type: no_motion
    platform: device
    device_id: e778003098458a439c326f70937a426c
    entity_id: binary_sensor.hobbykamer_motion_sensor_hoek_beweging
    domain: binary_sensor
    for:
      hours: 2
      minutes: 0
      seconds: 0
      milliseconds: 0
    id: geen beweging
  - platform: state
    entity_id: input_boolean.algemeen_input_boolean_slapen
    id: slapen
    from: 'off'
    to: 'on'
  - platform: state
    entity_id: input_boolean.algemeen_input_boolean_thuis
    id: weg van huis
    from: 'on'
    to: 'off'
  - platform: time
    at: '06:00'
    id: handmatige stand uitzetten
condition:
  - condition: state
    entity_id: input_boolean.hobbykamer_input_boolean_gastenstand
    state: 'off'
action:
  - choose:
      - conditions:
          - condition: trigger
            id: licht aan
          - condition: state
            entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
            state: 'off'
        sequence:
          - service: script.hobbykamer_script_ventilator
      - conditions:
          - condition: trigger
            id: licht uit
          - condition: device
            type: is_off
            device_id: f314e82799e2cf6c11e7e084b337a469
            entity_id: switch.hobbykamer_smart_plug_dj_booth
            domain: switch
          - condition: state
            entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
            state: 'off'
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: Tijdspatroon
          - condition: device
            type: is_on
            device_id: e23552b835e274e0f918377ae2059a0c
            entity_id: light.hobbykamer_lamp_plafond
            domain: light
            for:
              hours: 0
              minutes: 8
              seconds: 0
              milliseconds: 0
          - condition: state
            entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
            state: 'off'
        sequence:
          - service: script.hobbykamer_script_ventilator
      - conditions:
          - condition: trigger
            id: geen beweging
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: slapen
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: weg van huis
        sequence:
          - service: fan.turn_off
            target:
              entity_id: fan.hobbykamer_ventilator
      - conditions:
          - condition: trigger
            id: handmatige stand uitzetten
        sequence:
          - service: input_boolean.turn_off
            target:
              entity_id: input_boolean.hobbykamer_input_boolean_ventilator_handmatig
    default: []
mode: restart

See below the log error from the home assistant logger that i could retrieve after restarting home assistant:

Logger: homeassistant.components.recorder.util
Source: components/recorder/util.py:408
Integration: Recorder (documentation, issues)
First occurred: 08:56:25 (1 occurrences)
Last logged: 08:56:25

Ended unfinished session (id=13 from 2021-10-30 09:11:46.923041)
ChristophCaina commented 3 years ago

58787

ioannispelelis commented 3 years ago

Same issue here High CPU usage, logging stopped so all graphs are flatline until i rebooted HA Also an automation that was set to start at 02:01 tried to start almost 800 000 times until 03:00 where it stopped trying apparently (and it should not have tried to restart because it was on mode: single) And to makes things worse the plugs that were supposed to be started by this automation were banned by the supplier from cloud services because of an extremely high frequency of update during this night...

I opened a ticket #58792 i guess issues are related...

basschipper commented 3 years ago

Mmm, maybe my issue is also related #58791

basschipper commented 3 years ago

Transferred my issue #58791 to here, to keep everything together:

This morning I discovered that my HA Core was down after the DST change (NLD located). From the dmesg log I conclude that HA core was OOM killed. After reviewing the home assistant log it seems that two of my automations, that also start at exactly 2 o'clock, seem to have stalled? Home Assistant is continually complaining that these two automations are "Already running". This repeats many times until the OOM killer kills Home Assistant.

Bringing HA Core back is just matter of typing: ha core start.

Which also reaches the question why the supervisor didn't restart HA Core?

What happend to these automations during DST? Maybe it's better to run time based automations at 5 minutes past the hour?

Dmesg:

[665397.236593] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=b9a094bf125883ccd4620ef09874aa2a7338bca139ac8231e3a4d220eef70454,mems_allowed=0,global_oom,task_memcg=/docker/b9a094bf125883ccd4620ef09874aa2a7338bca139ac8231e3a4d220eef70454,task=python3,pid=2142414,uid=0
[665397.236650] Out of memory: Killed process 2142414 (python3) total-vm:4974216kB, anon-rss:3375328kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:9664kB oom_score_adj:0

HA Core:

2021-10-31 02:51:47 INFO (MainThread) [buienradar.buienradar_json] Parse ws data: latitude: ***, longitude: ***
2021-10-31 02:51:47 INFO (MainThread) [buienradar.buienradar_json] Parse ws data: latitude: ***, longitude: ***
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Executing step call service
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Executing step call service
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Executing step call service
... OTGW and Luftdaten logs repeating many times ...
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Executing step call service
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
... OTGW and Luftdaten logs repeating many times ...
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
[finish] process exit code 256
[finish] process received signal 9
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
[s6-finish] sending all processes the TERM signal.
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
... OTGW and Luftdaten logs repeating many times ...
[s6-finish] sending all processes the KILL signal and exiting.
hmmbob commented 3 years ago

No need to open additional issues for this, just keep all info in this thread.

If you are impacted by this issue, but do not have new info to add, please just use the "thumb up" emoji in the initial post of this issue instead of "me too" or "+1" replies.

This seems like a genuine bug, most likely impacting every and all HA users on all recent versions that had "Daylight Savings Time" rollover last night.

ualex73 commented 3 years ago

I am still running 2021.8.8 and it has the same issue too. A restart fixed it, and influx data seemed to be recorded from 02am until my restart (so I didn't loose long term data). information from Influx: memory jump from 700MB to 3.5GB and CPU seem to have tripled.

ioannispelelis commented 3 years ago

So i am transfering my ticket #58792 that i closed to this thread instead:

I have an automation that is set to start at 02:01 every night that turns on a few plugs for charging during off-peak hours Tonight with the time change (at 03:00 we went back to 02:00) the automation got completely out of control:

.... skipping almost 800 000 lines like this...

2021-10-31 02:59:58 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running 2021-10-31 02:59:58 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running 2021-10-31 02:59:59 WARNING (MainThread) [homeassistant.components.automation.activation_prises_hc_nuit] Activation Prises HC Nuit: Already running 2021-10-31 03:00:03 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved



- Also I think that because of this bug Merross detected the activity on the plugs trying to be set to ON almost 800 000 times and they just sent me a notification that they are terminating the cloud services for all the concerned plugs (the ones that were set to turn on in the automation). I am in contact with them so they restore the cloud services...

- Also CPU usage went up to 60% during this time (normally it is between 5-15%) probably since it was trying to start the automation 800 000 times withing an hour... (i guess it was at 100% of a single core since i am running on 2 vCPUs)
- Also logging seemed to stop for all sensors at 02:00. it was restored after reboot of HA
sgofferj commented 3 years ago

Same here. Additionally, HA disarmed the alarm system at the time of the change... Luckily, I have Alexa speakers telling me if the alarm system is being armed or disarmed. But a rude awakening it was ^^.

pimw1 commented 3 years ago

Transferred my issue #58791 to here, to keep everything together:

This morning I discovered that my HA Core was down after the DST change (NLD located). From the dmesg log I conclude that HA core was OOM killed. After reviewing the home assistant log it seems that two of my automations, that also start at exactly 2 o'clock, seem to have stalled? Home Assistant is continually complaining that these two automations are "Already running". This repeats many times until the OOM killer kills Home Assistant.

Bringing HA Core back is just matter of typing: ha core start.

Which also reaches the question why the supervisor didn't restart HA Core?

What happend to these automations during DST? Maybe it's better to run time based automations at 5 minutes past the hour?

Dmesg:

[665397.236593] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=b9a094bf125883ccd4620ef09874aa2a7338bca139ac8231e3a4d220eef70454,mems_allowed=0,global_oom,task_memcg=/docker/b9a094bf125883ccd4620ef09874aa2a7338bca139ac8231e3a4d220eef70454,task=python3,pid=2142414,uid=0
[665397.236650] Out of memory: Killed process 2142414 (python3) total-vm:4974216kB, anon-rss:3375328kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:9664kB oom_score_adj:0

HA Core:

2021-10-31 02:51:47 INFO (MainThread) [buienradar.buienradar_json] Parse ws data: latitude: ***, longitude: ***
2021-10-31 02:51:47 INFO (MainThread) [buienradar.buienradar_json] Parse ws data: latitude: ***, longitude: ***
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Executing step call service
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Executing step call service
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Executing step call service
... OTGW and Luftdaten logs repeating many times ...
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Running automation actions
2021-10-31 02:00:00 INFO (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Executing step call service
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
... OTGW and Luftdaten logs repeating many times ...
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
[finish] process exit code 256
[finish] process received signal 9
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
[s6-finish] sending all processes the TERM signal.
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.luftdaten_push] Luftdaten Push: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
2021-10-31 02:00:05 WARNING (MainThread) [homeassistant.components.automation.otgw_outside_temperature] OTGW Outside Temperature: Already running
... OTGW and Luftdaten logs repeating many times ...
[s6-finish] sending all processes the KILL signal and exiting.

Hi Bas, from my docker error logs, i am reaching the same conclusion as you:

Mpgod80 commented 3 years ago

Same problem for me. High Cpu useage After time change here in Sweden :/ Even ram usage was really high. I restarted my Odroid Blue and now it works.

bieniu commented 3 years ago

Same here. I have two automations starting at 2:00 am, and these two automations were triggered ~150 times per second after time change. 5 minutes later, the recorder could not withstand the load.

2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.hassio_backup] Hassio Backup: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
2021-10-31 02:00:00 WARNING (MainThread) [homeassistant.components.automation.shellies_announce] Shellies Announce: Already running
.
.
.
2021-10-31 02:05:01 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded

In the morning the log was almost 70 MB.

Goz3rr commented 3 years ago

Just chiming in with my setup as well.

- id: '1582323405372'
  alias: Porchlight off after sunrise
  description: ''
  trigger:
  - event: sunrise
    offset: 00:15:00
    platform: sun
  condition: []
  action:
  - device_id: f7c2a35fcac645689bb72ade5cc0d2d8
    domain: light
    entity_id: light.0x680ae2fffea14357_light
    type: turn_off
- id: '1582323455203'
  alias: Porchlight on before sunset
  description: ''
  trigger:
  - event: sunset
    offset: -00:15:00
    platform: sun
  condition: []
  action:
  - device_id: f7c2a35fcac645689bb72ade5cc0d2d8
    domain: light
    entity_id: light.0x680ae2fffea14357_light
    type: turn_on

Around 9AM, before restarting my postgres container was using about 50% CPU/6GB RAM and the home assistant container is using 70% CPU/3GB RAM.

When opening something in the UI, home assistant shows that the data was recently updated, but all of the graphs stop at 3AM when the time changed. The actual displayed value is correct but the graph (and its tooltips) is stuck at the value from 3AM: image

Data that is exported to influxDB is unaffected and continued to be fine after the time changed.

I pressed the restart button in the home assistant UI. The connection was lost instantly and the logs showed nothing for about a minute or two until:

homeassistant     | 2021-10-31 09:52:28 WARNING (MainThread) [homeassistant.core] Timed out waiting for shutdown stage 1 to complete, the shutdown will continue
homeassistant     | 2021-10-31 09:52:30 WARNING (Thread-15) [homeassistant.util.executor] Thread[SyncWorker_5] is still running at shutdown: File "/usr/local/lib/python3.9/threading.py", line 930, in _bootstrap
homeassistant     |     self._bootstrap_inner()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
homeassistant     |     self.run()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 910, in run
homeassistant     |     self._target(*self._args, **self._kwargs)
homeassistant     |   File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 77, in _worker
homeassistant     |     work_item.run()
homeassistant     |   File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
homeassistant     |     result = self.fn(*self.args, **self.kwargs)
homeassistant     |   File "/usr/src/homeassistant/homeassistant/components/recorder/__init__.py", line 532, in shutdown
homeassistant     |     self.join()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 1053, in join
homeassistant     |     self._wait_for_tstate_lock()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 1069, in _wait_for_tstate_lock
homeassistant     |     elif lock.acquire(block, timeout):
homeassistant     | 2021-10-31 09:52:31 WARNING (Thread-15) [homeassistant.util.executor] Thread[SyncWorker_5] is still running at shutdown: File "/usr/local/lib/python3.9/threading.py", line 930, in _bootstrap
homeassistant     |     self._bootstrap_inner()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
homeassistant     |     self.run()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 910, in run
homeassistant     |     self._target(*self._args, **self._kwargs)
homeassistant     |   File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 77, in _worker
homeassistant     |     work_item.run()
homeassistant     |   File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
homeassistant     |     result = self.fn(*self.args, **self.kwargs)
homeassistant     |   File "/usr/src/homeassistant/homeassistant/components/recorder/__init__.py", line 532, in shutdown
homeassistant     |     self.join()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 1053, in join
homeassistant     |     self._wait_for_tstate_lock()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 1069, in _wait_for_tstate_lock
homeassistant     |     elif lock.acquire(block, timeout):
homeassistant     | 2021-10-31 09:52:32 ERROR (MainThread) [homeassistant] Error doing job: Future exception was never retrieved
homeassistant     | Traceback (most recent call last):
homeassistant     |   File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
homeassistant     |     result = self.fn(*self.args, **self.kwargs)
homeassistant     |   File "/usr/src/homeassistant/homeassistant/components/recorder/__init__.py", line 532, in shutdown
homeassistant     |     self.join()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 1053, in join
homeassistant     |     self._wait_for_tstate_lock()
homeassistant     |   File "/usr/local/lib/python3.9/threading.py", line 1069, in _wait_for_tstate_lock
homeassistant     |     elif lock.acquire(block, timeout):
homeassistant     | SystemExit
homeassistant     | Home Assistant attempting to restart.
homeassistant     | Restarting Home Assistant

After which home assistant restarted and the web UI was accessible again. After the restart of HA its CPU usage was jumping around 10-20% with 500MB of ram usage, and right after HA came up again postgres CPU usage jumped to 300-400% with no change in RAM usage (6GB). This lasted for about 10 minutes after which postgres CPU dropped to <1% and the graphs in HA finally updated and suddenly showed the last 10 minutes of data (since HA was restarted): image

Relevant logs from around 3AM:

homeassistant     | 2021-10-31 02:39:00 WARNING (MainThread) [dsmr_parser.clients.protocol] Invalid telegram. The CRC checksum '56947' does not match the expected '64544'
homeassistant     | 2021-10-31 02:55:00 WARNING (MainThread) [dsmr_parser.clients.protocol] Invalid telegram. The CRC checksum '3497' does not match the expected '50048'
homeassistant     | 2021-10-31 02:02:44 WARNING (Thread-6) [pychromecast.socket_client] [Kantoor 2(192.168.0.189):8009] Heartbeat timeout, resetting connection
homeassistant     | 2021-10-31 02:03:15 ERROR (Thread-6) [pychromecast.socket_client] [Kantoor 2(192.168.0.189):8009] Failed to connect to service ServiceInfo(type='host', data=('192.168.0.189', 8009)), retrying in 5.0s
homeassistant     | 2021-10-31 02:03:46 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 37a8 expected=00b6
homeassistant     | 2021-10-31 02:04:16 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 5ea0 expected=3909
homeassistant     | 2021-10-31 02:04:43 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 96f1 expected=6002
homeassistant     | 2021-10-31 02:05:13 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq e0a0 expected=9853
homeassistant     | 2021-10-31 02:05:46 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 18ee expected=e202
homeassistant     | 2021-10-31 02:06:20 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 3fe6 expected=1a4f
homeassistant     | 2021-10-31 02:06:49 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 8992 expected=4148
homeassistant     | 2021-10-31 02:07:16 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq c1df expected=8af4
homeassistant     | 2021-10-31 02:07:40 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 0b88 expected=c341
homeassistant     | 2021-10-31 02:07:44 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
homeassistant     | 2021-10-31 02:08:05 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 43d7 expected=0ce9
homeassistant     | 2021-10-31 02:08:34 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 7c17 expected=4539
homeassistant     | 2021-10-31 02:08:58 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq b467 expected=7d79
homeassistant     | 2021-10-31 02:09:27 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 0f5d expected=b5c9

...

homeassistant     | 2021-10-31 02:59:05 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 4250 expected=0b9a
homeassistant     | 2021-10-31 02:59:38 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 8bae expected=43b2
homeassistant     | 2021-10-31 02:59:59 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq d50a expected=8d10
homeassistant     | 2021-10-31 02:59:59 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 0d26 expected=d66c
homeassistant     | 2021-10-31 02:59:59 ERROR (stream_worker) [libav.rtsp] RTP: PT=60: bad cseq 22b9 expected=0e88
homeassistant     | 2021-10-31 03:00:03 WARNING (MainThread) [homeassistant.helpers.entity] Update of vacuum.xiaomi_vacuum_cleaner is taking over 10 seconds
homeassistant     | 2021-10-31 03:00:04 WARNING (SyncWorker_7) [homeassistant.components.xiaomi_miio.vacuum] Got exception while fetching the state: Unable to discover the device 192.168.0.50
CarlosGS commented 3 years ago

To show the problem visually: Screenshot_20211031_100559 (At about 2:08 all HASS graphs stopped updating)

nilsmau commented 3 years ago

Same issue here. home assistant died with last logs (remote sql db) just before the switch from summer to winter time. Hard reboot (twice?) has solved the issue.

Core-2021.10.6 Supervisor-2021.10.6 HassOS 6.5 Remote SQL DB SSD

Arakon commented 3 years ago

Same here, got the "The recorder queue reached the maximum size of 30000; Events are no longer being recorded"

Odroid C2 with EMMC installation Core-2021.10.6 Supervisor-2021.10.6 Supervised on Debian 11 MariaDB

Reboot Host via Supervisor took an unusually long time, but seems to be okay now.

TomBrien commented 3 years ago

Same running full stack Home Assistant on and i5 NUC bare metal

martin3000 commented 3 years ago

It seems that automations with a time_pattern go into an endless loop at 3:00->2:00

pimw1 commented 3 years ago

Exactly.

TomBrien commented 3 years ago

I do not use time_pattern anywhere. I do have some template sensors that require a for condition

LittleBigDev commented 3 years ago

Same problem as others. I run under HassOS.
Versions info :

teaserrr commented 3 years ago

This morning I just noticed the The recorder queue reached the maximum size of 30000; Events are no longer being recorded logging at 02:00:39 and no more entity updates since this time. The UI was still responsive and the entity updates were solved with a server restart from within the UI.

I'm not using any automations with a time_pattern either.

michalk-k commented 3 years ago

The same problem found in HA v 2021.9.7 (so it's not 2021.10.x related issue). CPU load jumped from 15% to 35% (rpi 4 8GB). Core install. supervisor-2021.10.6

ItIsSeven commented 3 years ago

Same issue on core-2021.10.6 Host Operating System OS 6.5

@CarlosGS Unrelated, but may I ask what you're using to get those graphs, looks very nice!

bbr111 commented 3 years ago

I can confirm the issue. Same here.

HA Core 2021.10.6 on RPI 4

borpin commented 3 years ago

Happened to me on a Blue - UK DST change - unresponsive in morning last records at 1am. No idea what had happened (no persistent logs on HA OS) hard reboot solved it.

core-2021.10.6 supervisor-2021.10.6 Home Assistant OS 6.5

Tho85 commented 3 years ago

It seems that automations with a time_pattern go into an endless loop at 3:00->2:00

Can confirm. I have several automations with trigger time_pattern, and those where the corresponding conditions were true have been restarted over and over.

(One of those automations refreshes a google_travel_time sensor and hits Google's paid Distance Matrix API. Luckily, it looks like the sensor didn't actually refresh, as I can't see a spike in API requests on my Google Cloud dashboard. Otherwise I'd be in for a surprise when my Google Cloud bill shows up...)

Home Assistant 2021.10.6 running in Docker

hmmbob commented 3 years ago

Happened to me on a Blue - UK DST change - unresponsive in morning last records at 1am. No idea what had happened (no persistent logs on HA OS) hard reboot solved it.

In your config directory, you'll find a home-assistant.log.1 file - it is the previous log when HA restarted.

I bet you see something like ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded in there.

mib1185 commented 3 years ago

In your config directory, you'll find a home-assistant.log.1 file - it is the previous log when HA restarted.

This log.1 is only created during graceful reboot/shutdown, but not after a hard reset/reboot

hmmbob commented 3 years ago

Hmm, would be useful to have it in ungraceful situations as well - actually, it would make most sense to have it especial during ungraceful shutdown events, but i guess that's a different issue/request ;-)

mdeweerd commented 3 years ago

From #58799

The problem

The summer to winter time change means that two hours elapse sur the same wall clock time.

This means that there should be energy consumption for all time periods, and double for the period from 2AM to 3AM. I do not know if values are recorded against the UTC clock (no DST adjustments) and retrieved for local time.

On the next graph we notice:

To show that the consumption is still measured, this is a screenshot at 11h45 of the values used for consumption: image

The absence of data in the history is also the case for other values, so not only related to Energy. We can see that humidity etc are all flat: image

Log attached (removed zigbee related traces, device_tracker info): home-assistant-filtered.zip

Additional information

A remote location where consumption is "constant" is currently unavailable, I may append information from there to the current issue report. I managed to get access to the supervisor Web UI which indicates all is well, but I can't access port 8123 yet.

image

Note: a ha core restart on my local system fixed things. Energy usage is all allotted to the same time frame.

It would be nice to have a tool to distribute such usage over time in /developer-tools/statistics - the issue detection could be that:

image

Supervisor/Observer - Not helping in this case

As indicated above the Supervisor/Observer page indicates that all is well. I did an nmap as shown below. It detects port 443 which results in the error also shown below, suggesting that the HA server is not up. It would be nice that the Supervisor/Observer also monitors that the server port is actually up and restarts/reboots after some time/trials.

$ nmap -p 1-10000 homeassistant.local
Starting Nmap 7.80 ( https://nmap.org ) at 2021-10-31 13:34 CET
Nmap scan report for homeassistant.local (192.168.5.66)
Host is up (0.0069s latency).
rDNS record for 192.168.5.66: homeassistant
Not shown: 9997 closed ports
PORT     STATE SERVICE
443/tcp  open  https
4357/tcp open  qsnet-cond
5355/tcp open  llmnr

Nmap done: 1 IP address (1 host up) scanned in 11.50 seconds

Using port forwarding over an ssh tunnel, I mapped localhost:3443 to homeasistant.local:433 on the "remote site". The following confirms that the nginx proxy is running (using http, nginx complains rightfully about the protocol error):

image

And using curl on https: using another raspberry pi running on the remote network:

 $ curl -v -k https://homeassistant.local
*   Trying 192.168.5.66:443...
* Connected to homeassistant.local (192.168.5.66) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=localhost
*  start date: Jun  7 14:38:28 2021 GMT
*  expire date: Jun  5 14:38:28 2031 GMT
*  issuer: CN=localhost
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x11d5d20)
> GET / HTTP/2
> Host: homeassistant.local
> user-agent: curl/7.74.0
> accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
* stopped the pause stream!
* Connection #0 to host homeassistant.local left intact

I'll have to restart home assistant locally.

bcutter commented 3 years ago

Jup, that was a bad one. As I'm new to HA: I hope this is not the default behavior when it comes to time switch twice a year.

Statistics frozen at 2-3 o'clock this morning and load went like crazy when doing nightly backup (starting 3 o'clock so I guess it was started twice).

image

osxdoc commented 3 years ago

Same here, 2:05 is after 2:40 A214C1BE-C580-4B52-8DB7-AFAA173DD0CD

mib1185 commented 3 years ago

In your config directory, you'll find a home-assistant.log.1 file - it is the previous log when HA restarted.

This log.1 is only created during graceful reboot/shutdown, but not after a hard reset/reboot

This needs to be revised - the rotation of the log file is always done during the start of HA

ralphhughes commented 3 years ago

Since I don't see my HA version mentioned in this thread, a quick note to say that I had exactly the same symptoms on: Core version: core-2021.9.4 Supervisor version: supervisor-2021.10.6

All sensors stopped recording within a couple of minutes of each other around 01:21 this morning. Interestingly my time based automation to start the boiler at 6am did run, however it stayed stuck on since then for reasons unknown. Manual restart of home assistant fixed things.

Good luck with the debugging!

greghesp commented 3 years ago

Same issue here. Seems something went wrong and i think historically the recording is wrong.

Top Apex graph shows the wrong last sensor value and graph. Bottom HA graph shows wrong graph but correct sensor value. Even after a reboot, the charts are still as below

image

maurizioandreotti60 commented 3 years ago

I got the error message on the recorder after a lot of warning from InfuxDB. After a while HA stopped to work. 2021-10-31 02:07:25 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 1692 old events. 2021-10-31 02:07:27 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 3409 old events. 2021-10-31 02:07:28 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 1477 old events. 2021-10-31 02:07:29 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 1788 old events. 2021-10-31 02:07:31 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 2862 old events. 2021-10-31 02:07:32 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 2021 old events. 2021-10-31 02:07:35 WARNING (influxdb) [homeassistant.components.influxdb] Catching up, dropped 178 old events. 2021-10-31 02:07:36 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded

alim4r commented 3 years ago

I've got the same problem. CPU and RAM spike at 02:05. Maximum queue error at 02:08. 2021-10-31 02:08:21 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded.

I don't have any automations in home assistant (only in node red).

ha time change

Maybe it has also something to do with the recorder short-term statistics? It was added this year (#56006) and runs every 5 minutes (spike at 02:05?) but might be just a coincidence.

https://github.com/home-assistant/core/blob/81845bb0b5fc3af93540e2adb168bdb57cad3f55/homeassistant/components/recorder/__init__.py#L624-L627